Next Article in Journal
Investigation of Gold Gilding Materials and Techniques Applied in the Murals of Kizil Grottoes, Xinjiang, China
Next Article in Special Issue
Temporal-Guided Label Assignment for Video Object Detection
Previous Article in Journal
Mobile Cyber Forensic Investigations of Web3 Wallets on Android and iOS
Previous Article in Special Issue
Aquila Optimization with Transfer Learning Based Crowd Density Analysis for Sustainable Smart Cities
 
 
Article
Peer-Review Record

Small Object Detection in Infrared Images: Learning from Imbalanced Cross-Domain Data via Domain Adaptation

Appl. Sci. 2022, 12(21), 11201; https://doi.org/10.3390/app122111201
by Jaekyung Kim 1, Jungwoo Huh 1, Ingu Park 2, Junhyeong Bak 2, Donggeon Kim 2 and Sanghoon Lee 1,*
Reviewer 1:
Reviewer 2:
Appl. Sci. 2022, 12(21), 11201; https://doi.org/10.3390/app122111201
Submission received: 11 October 2022 / Revised: 31 October 2022 / Accepted: 2 November 2022 / Published: 4 November 2022
(This article belongs to the Special Issue Deep Learning in Object Detection and Tracking)

Round 1

Reviewer 1 Report

Small Object detection algorithms for detecting  small objects in infrared images, a novel YOLOv5 based framework with multi domain  training strategy has been described. Also, discussed the use of the new loss function and the domain adaptation module.

 

This type of work is suitable for publishing in this reputed Journal. However, some points that can be improved the quality of the paper are given below.

1. The abstract of the paper is not written properly. (Also include performance analysis) 

2. Needed to improved grammatical error.
3.
Improved the figure quality for figure 4.

Author Response

Response to Reviewer 1 Comments

Thank you for reviewing our manuscript. We have carefully read your comments and revised our manuscript accordingly.

 

Point 1: The abstract of the paper is not written properly. (Also include performance analysis)

Response 1: Thank you for your valuable comment. We have updated our abstract to be more contextualized sufficiently describing the paper, along with including the performance analysis.

 

Point 2: Needed to improved grammatical error.

Response 2: Thank you for thoroughly reviewing our manuscript. We have checked and updated the manuscript fixing the grammar errors and typos.

 

Point 3: Improved the figure quality for figure 4.

Response 3: Thank you for your comment. We have re-arranged the figure 4 to include more samples from database, including images from visible-range and infrared domain.

Reviewer 2 Report

The authors proposed a framework for small object detection from infrared images using domain adaption and imbalanced cross-domain data. The study is interesting and my comments are given below. 

1. The number of images in each dataset could be provided in a table.

2. How did you optimize the hyperparameters of the proposed model? If they are empirically selected, their details should be included. 

3. What is the motivation behind Yolov5, since yolov6 and yolov7 are already available. 

4. Some references are missing throughout the manuscript. 

5. More details about C3 modules should be provided for the sake of wide readers. 

6. Some sample infrared images could be provided for comparison with visible light images. 

7. Any link for the proposed framework could be provided for review.

 

There are some minor grammar corrections required throughout the manuscript.  

Author Response

Response to Reviewer 2 Comments

Thank you for reviewing our manuscript. We have carefully read your comments and revised our manuscript accordingly.

 

Point 1: The number of images in each dataset could be provided in a table.

Response 1: Thank you for your comment. We havd added an additonal table in section 3.4 describing the number of images used in each dataset. As images with small-sized human or vehicle related labels were extracted, the count of images per database is as below.

Database

Image Count

MS COCO

3,231

FLIR ADAS

3,086

VEDAI

2,538

DOTA

1,680

Generated Visible-light

21,225

Generated Infrared

3,000

 

Point 2: How did you optimize the hyperparameters of the proposed model? If they are empirically selected, their details should be included.

Response 2: Thank you for your comment. As our work includes multiple training stages, we had to empirically adjust the hyperparameters of the model. We added the detailed settings and parameters in section 3.3. In our framework, the size of the input image is 640 x 640 in batch size 16, and the training epochs are set to 1,000 at each stage.The stochastic gradient descent (SGD) optimizer is used to optimize the network parameters. In training stage 1 and 2, the parameters of the network are optimized with a initial learning rate of 0.01, momentum of 0.8, and weight decay of 0.0005 every 50 epochs. In training stage 3 which is a fine-tuning stage, the initial learning rate is set to 0.001, momentum of 0.8, and no weight decay. Also, the batch size is reduced to 4 in the training stage 3.

 

Point 3: What is the motivation behind Yolov5, since yolov6 and yolov7 are already available.

Response 3: Thank you for your valuable comment. We have compared YOLOv3, v4 and v5 in section 2.2, which are the 3 major branches of YOLO family. However, improved versions of YOLO architectures (YOLOv6, YOLOv7) have been released recently showing state-of-the-art accuracy in general object detection tasks.

Compared to YOLOv5, the most significant change of YOLOv6 is prediction head part. The v5 model consists of 3 scales of heads, while the v6 model utilizes 4 scales of heads. This enables the v6 model to detect more diverse size of objects. However, in our task of detecting small-sized targets, this is less likely to benefit the detection accuracy of the model. Also, this change increases the total parameter count of the model, meaning that the model takes longer time to be trained.

In case of YOLOv7, the authors focused on achieving higher detection accuracy without deteriorating inference speed. These type of methods are called ‘Bag-of-Freebies (BoF)’, meaning that the accuracy of model is increased without increasing inference cost. In YOLOv7, training strategy using auxiliary head is proposed. The auxiliary head is added in the middle of the network, meaning that it is less efficiently trained as there is less network parameters involved. From this motivation, pseud soft-label(coarse-to-fine definition) is generated passing back supervision at different granularities from the main head, allowing more effective training. On the other hand, we proposed a domain-adaptaion based novel traning strategy based on domain soft-labels. Therefore, we choose YOLOv5 as baseline architecture to prevent conflict in training process. It can be summaraized that we focused on feature extraction ability of YOLOv5 model, while YOLOv6 and YOLOv7 more emphasizes on generalized detection performance in common object detection task.

 Additionally, YOLOv5 is the most actively maintained open-source project in YOLO family allowing the model to be easily modified, trained and inferenced on multiple platforms. Considering the application case of our proposed model, we used YOLOv5 codebase.

 

Point 4: Some references are missing throughout the manuscript.

Response 4: Thank you for thoroughly reviewing our manuscript. We have checked and updated the manuscript fixing the missing references and typos.

 

Point 5: More details about C3 modules should be provided for the sake of wide readers.

Response 5: Thank you for your comment. We included additional description and reference of C3 modules in section 3.1 for the sake of readers. The C3 module comes from cross-stage partial networks(CSPNet), and can be regarded as a specific implementation of CSPNet with 3 convolution modules. The module enables the backbone to extract visual features efficiently, while preventing the gradient duplication problem in the backbone.

 

Point 6: Some sample infrared images could be provided for comparison with visible light images.

Response 6:  Thank you for your comment. We have re-arranged the figure 4 to include more samples from database, including images from visible-range and infrared domain.

 

Point 7: Any link for the proposed framework could be provided for review.

Response 7:  Thank you for your comment. The framework is based on PyTorch open-source code, which can be found at https://github.com/ultralytics/yolov5 . The code of our framework is not yet released online, but we are planning to publish the framework and data generation code as soon as possible.

Round 2

Reviewer 2 Report

Well improved manuscript. 

Back to TopTop