Next Article in Journal
AI-Assisted CBCT Data Management in Modern Dental Practice: Benefits, Limitations and Innovations
Next Article in Special Issue
An Efficient Ship-Detection Algorithm Based on the Improved YOLOv5
Previous Article in Journal
A Robust Feature Extraction Method for Underwater Acoustic Target Recognition Based on Multi-Task Learning
Previous Article in Special Issue
Attention-Enhanced Lightweight One-Stage Detection Algorithm for Small Objects
 
 
Article
Peer-Review Record

INTS-Net: Improved Navigator-Teacher-Scrutinizer Network for Fine-Grained Visual Categorization

Electronics 2023, 12(7), 1709; https://doi.org/10.3390/electronics12071709
by Huilong Jin, Jiangfan Xie, Jia Zhao *, Shuang Zhang, Tian Wen, Song Liu and Ziteng Li
Reviewer 1:
Reviewer 2:
Electronics 2023, 12(7), 1709; https://doi.org/10.3390/electronics12071709
Submission received: 28 February 2023 / Revised: 27 March 2023 / Accepted: 28 March 2023 / Published: 4 April 2023

Round 1

Reviewer 1 Report

The paper is well-organized; however, the English writing needs improvement as there are some confusing sentences and words.

The introduction would benefit from the inclusion of additional references to support the claims being made.

The experiment should clarify how the data was split into training and testing sets.

Also, it would be helpful if the authors could provide more information on how they tuned the hyperparameters. Additionally, it would be informative to see the results of an ablation test.

I would suggest that the reported results in the Table be presented as the average value with its corresponding standard deviation to provide a clearer understanding of the data variability.

Additionally, it would be beneficial for the paper to provide more information on the pre-processing steps taken for the data.

I still have some doubts regarding the effectiveness of the Scrutinizer network. Could the authors provide more details and explanations on this part and illustrate why it can be helpful for the prediction? Personally, I am concerned that concatenation may miss some useful information.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Fine-grained image recognition has become prevalent in various applications in the real world, which is a significant branch of computer vision. It is more challenging than general image recognition due to the highly localized and subtle differences in special parts. In this paper, the authors have improved the NTS-Net, and proposed the INTS-Net model, which adds the noise injection method SPS and fuses batch normalization and convolution in runtimes, and made flexible adjustments to make two methods more suitable for the NTS-Net network. Based on my understanding, I have to make the following comments:

 

[1]- The title is not readable and should be modified. The manuscript title is not clear and not accepted for readability.

 

[2]- There are abbreviations in the abstract that is unknown.

 

[3]- Provide all the abbreviations.

 

[4]- The performance of object recognition can be better in terms of recognition accuracy and model parameter.

 

[5]- The evaluations are not enough for (1) category attention teaching CNN (2) configuration comparison with the baseline (3) Dataset and implementation details, especially in strong limitation of memory and computation (4) the comparison experiment of INTS-Net is carried out in the same environment, The evaluation and contributions of this method for not enough for FGVC and Stochastic Gradient Momentum (SGDM) as the optimizer of the model (5) Noise injection methods (6) the classic fine-grained image recognition model NTS-Net., and SPS method.

 

[6]- The conclusions part must be improved with the aid of drawing comparisons with similar works across the literature.

 

[7]- The paper does not provide mathematical proof or demonstrations of the simulated metrics.

 

[8]- The analysis and results are not adequate in this form

 

[9]- An illustration of the SPS is not enough. It is not sufficiently integrated and does not explain all the features and aspects of the design that how SPS exploits samples as a source for noise injection in Fig. 1

 

[10]- In the training, the information areas of top-K combined with the whole picture are used as the input of the Scrutineer network, which promotes fine-grained image recognition. Therefore, in addition to comparing with the original NTS-Net results, the influence of the K value on the experimental results is also explored. Experiments, statistics, and other analyses for INTS-Net are should be to a high technical standard and described in sufficient detail.

 

[11]- The authors need to thoroughly proofread the manuscript.

Grammar, format checks, and spelling are required. A thorough revision of the English language usage is required for improving the technicality of the paper.

 

 

[12]- All the approximations and assumptions/relaxation used to visualize the last layer of ResNet50 should be made compared with classical network visualization methods, Guided backpropagation, and CAM.

 

[13]- Using the categorization attention supervision, the training includes three steps: (1) We use the class labels only to fine-tune a teacher networks. The structure of NTS-Net consists of three networks: Navigator network, Teacher Network, and Scrutinizer network. Firstly, the original image is input to the Navigator network, which includes a top-down architecture with horizontal connections to detect areas at multiple scales, and then the multi-scale feature map is used to generate areas with different scales and proportions. For the dataset: data preprocessing and augmentation, illustrate the proposed model with the latest network modules, models, Classification network, and model evaluations based on different efficient CNN backbones.  I strongly suggest that the methodology be better explained. There is a substantial lack of an evaluation of their proposal as the given results are not satisfactory.

 

 

[14]- The resolution and font are not clear for fig (5). The graph's resolution and fonts could be better in Figure (5). The references should be of good quality and up to date. For a clear visualization of the graph, it would be better if Figures.

 

 

[15]- Comparisons are required to evaluate performance and compare overall results improvement to make a valid and fair comparison of the results primarily on the classification.

 

 

 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report


Comments for author File: Comments.docx

Back to TopTop