Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Small Object Detection in Infrared Images: Learning from Imbalanced Cross-Domain Data via Domain Adaptation

Appl. Sci. 2022, 12(21), 11201; https://doi.org/10.3390/app122111201

by Jaekyung Kim¹

, Jungwoo Huh¹, Ingu Park², Junhyeong Bak², Donggeon Kim² and Sanghoon Lee^1,*

Reviewer 1:

Rajeev Ranjan

Reviewer 2:

Sudhakar Tummala

Appl. Sci. 2022, 12(21), 11201; https://doi.org/10.3390/app122111201

Submission received: 11 October 2022 / Revised: 31 October 2022 / Accepted: 2 November 2022 / Published: 4 November 2022

(This article belongs to the Special Issue Deep Learning in Object Detection and Tracking)

Round 1

Reviewer 1 Report

Small Object detection algorithms for detecting small objects in infrared images, a novel YOLOv5 based framework with multi domain training strategy has been described. Also, discussed the use of the new loss function and the domain adaptation module.

This type of work is suitable for publishing in this reputed Journal. However, some points that can be improved the quality of the paper are given below.

1. The abstract of the paper is not written properly. (Also include performance analysis)

2. Needed to improved grammatical error.
3. Improved the figure quality for figure 4.

Author Response

Response to Reviewer 1 Comments

Thank you for reviewing our manuscript. We have carefully read your comments and revised our manuscript accordingly.

Point 1: The abstract of the paper is not written properly. (Also include performance analysis)

Response 1: Thank you for your valuable comment. We have updated our abstract to be more contextualized sufficiently describing the paper, along with including the performance analysis.

Point 2: Needed to improved grammatical error.

Response 2: Thank you for thoroughly reviewing our manuscript. We have checked and updated the manuscript fixing the grammar errors and typos.

Point 3: Improved the figure quality for figure 4.

Response 3: Thank you for your comment. We have re-arranged the figure 4 to include more samples from database, including images from visible-range and infrared domain.

Reviewer 2 Report

The authors proposed a framework for small object detection from infrared images using domain adaption and imbalanced cross-domain data. The study is interesting and my comments are given below.

1. The number of images in each dataset could be provided in a table.

2. How did you optimize the hyperparameters of the proposed model? If they are empirically selected, their details should be included.

3. What is the motivation behind Yolov5, since yolov6 and yolov7 are already available.

4. Some references are missing throughout the manuscript.

5. More details about C3 modules should be provided for the sake of wide readers.

6. Some sample infrared images could be provided for comparison with visible light images.

7. Any link for the proposed framework could be provided for review.

There are some minor grammar corrections required throughout the manuscript.

Author Response

Response to Reviewer 2 Comments

Thank you for reviewing our manuscript. We have carefully read your comments and revised our manuscript accordingly.

Point 1: The number of images in each dataset could be provided in a table.

Response 1: Thank you for your comment. We havd added an additonal table in section 3.4 describing the number of images used in each dataset. As images with small-sized human or vehicle related labels were extracted, the count of images per database is as below.

Database	Image Count
MS COCO	3,231
FLIR ADAS	3,086
VEDAI	2,538
DOTA	1,680
Generated Visible-light	21,225
Generated Infrared	3,000

Point 2: How did you optimize the hyperparameters of the proposed model? If they are empirically selected, their details should be included.

Response 2: Thank you for your comment. As our work includes multiple training stages, we had to empirically adjust the hyperparameters of the model. We added the detailed settings and parameters in section 3.3. In our framework, the size of the input image is 640 x 640 in batch size 16, and the training epochs are set to 1,000 at each stage.The stochastic gradient descent (SGD) optimizer is used to optimize the network parameters. In training stage 1 and 2, the parameters of the network are optimized with a initial learning rate of 0.01, momentum of 0.8, and weight decay of 0.0005 every 50 epochs. In training stage 3 which is a fine-tuning stage, the initial learning rate is set to 0.001, momentum of 0.8, and no weight decay. Also, the batch size is reduced to 4 in the training stage 3.

Point 3: What is the motivation behind Yolov5, since yolov6 and yolov7 are already available.

Response 3: Thank you for your valuable comment. We have compared YOLOv3, v4 and v5 in section 2.2, which are the 3 major branches of YOLO family. However, improved versions of YOLO architectures (YOLOv6, YOLOv7) have been released recently showing state-of-the-art accuracy in general object detection tasks.

Compared to YOLOv5, the most significant change of YOLOv6 is prediction head part. The v5 model consists of 3 scales of heads, while the v6 model utilizes 4 scales of heads. This enables the v6 model to detect more diverse size of objects. However, in our task of detecting small-sized targets, this is less likely to benefit the detection accuracy of the model. Also, this change increases the total parameter count of the model, meaning that the model takes longer time to be trained.

In case of YOLOv7, the authors focused on achieving higher detection accuracy without deteriorating inference speed. These type of methods are called ‘Bag-of-Freebies (BoF)’, meaning that the accuracy of model is increased without increasing inference cost. In YOLOv7, training strategy using auxiliary head is proposed. The auxiliary head is added in the middle of the network, meaning that it is less efficiently trained as there is less network parameters involved. From this motivation, pseud soft-label(coarse-to-fine definition) is generated passing back supervision at different granularities from the main head, allowing more effective training. On the other hand, we proposed a domain-adaptaion based novel traning strategy based on domain soft-labels. Therefore, we choose YOLOv5 as baseline architecture to prevent conflict in training process. It can be summaraized that we focused on feature extraction ability of YOLOv5 model, while YOLOv6 and YOLOv7 more emphasizes on generalized detection performance in common object detection task.

Additionally, YOLOv5 is the most actively maintained open-source project in YOLO family allowing the model to be easily modified, trained and inferenced on multiple platforms. Considering the application case of our proposed model, we used YOLOv5 codebase.

Point 4: Some references are missing throughout the manuscript.

Response 4: Thank you for thoroughly reviewing our manuscript. We have checked and updated the manuscript fixing the missing references and typos.

Point 5: More details about C3 modules should be provided for the sake of wide readers.

Response 5: Thank you for your comment. We included additional description and reference of C3 modules in section 3.1 for the sake of readers. The C3 module comes from cross-stage partial networks(CSPNet), and can be regarded as a specific implementation of CSPNet with 3 convolution modules. The module enables the backbone to extract visual features efficiently, while preventing the gradient duplication problem in the backbone.

Point 6: Some sample infrared images could be provided for comparison with visible light images.

Response 6: Thank you for your comment. We have re-arranged the figure 4 to include more samples from database, including images from visible-range and infrared domain.

Point 7: Any link for the proposed framework could be provided for review.

Response 7: Thank you for your comment. The framework is based on PyTorch open-source code, which can be found at https://github.com/ultralytics/yolov5 . The code of our framework is not yet released online, but we are planning to publish the framework and data generation code as soon as possible.

Round 2

Reviewer 2 Report

Well improved manuscript.

Article Menu

Small Object Detection in Infrared Images: Learning from Imbalanced Cross-Domain Data via Domain Adaptation

Further Information

Guidelines

MDPI Initiatives

Follow MDPI