Next Article in Journal
Automatic Classification of Hospital Settings through Artificial Intelligence
Next Article in Special Issue
Document-Level Sentiment Analysis Using Attention-Based Bi-Directional Long Short-Term Memory Network and Two-Dimensional Convolutional Neural Network
Previous Article in Journal
Study on Gibbs Optimization-Based Resource Scheduling Algorithm in Data Aggregation Networks
 
 
Article
Peer-Review Record

DeepRare: Generic Unsupervised Visual Attention Models

Electronics 2022, 11(11), 1696; https://doi.org/10.3390/electronics11111696
by Phutphalla Kong 1,2,*,†, Matei Mancas 2,*,†, Bernard Gosselin 2 and Kimtho Po 1
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Electronics 2022, 11(11), 1696; https://doi.org/10.3390/electronics11111696
Submission received: 21 March 2022 / Revised: 2 May 2022 / Accepted: 5 May 2022 / Published: 26 May 2022
(This article belongs to the Special Issue Important Features Selection in Deep Neural Networks)

Round 1

Reviewer 1 Report

Please see attached.

Comments for author File: Comments.pdf

Author Response

We would first like to thank to the first reviewer for his interesting remarks. We are sorry for the response delay because the reviews arrived during vacation time. We uploaded the detailed response in a separate document. 

Author Response File: Author Response.pdf

Reviewer 2 Report

In the present paper, an approach called DeepRare2021 (DR21) that extends existing trained Deep Learning Architectures for visual recognition with visual attention. The approach is an enhancement of their previously presented work DeepRare2019 and is tested on four different eye tracking datasets.
Their approach utilizes existing and well tested DNN architectures as VGG19, VGG16, ResNet as feature extractors and extracts the feature maps of each layer in order to generate deep groups conspicuity maps  (DGCM). These maps are then summed up in order to infuse top-down information into the architecture.

While the results seem to be promising, I don't really understand how the visual attention that is generated by DeepRare in form of a saliency map is combined with the DNN on a technical level in order to generate, e.g. a classification output. How can the information be injected into the model?

As far as I understand, it is only evaluated how well the approach is mimicking human attention and not how much it improves the DNNs on e.g. ImageNet recognition tasks.
It is thus hard to estimate if it is worth to integrate DeepRare into an existing model for an improved performance on real-life image recognition tasks.

A detailed list of comments for the authors can be found below:

Abstract

- The abstract focuses on DeepRare2021, also the paper seems to be a mixture of presenting DeepRare2021 and evaluating both DeepRare2019 and DeepRare2021 on the 4 datasets. Maybe you can add that.
l. 1: Human visual system is modeled in engineering field: Do you mean: The human visual system is model in the field of engineering...?
  - Please check the grammar
l. 4:  Deep learning (DNNs)....
- Deep learning (DNNs) implies that DNNs is the abbreviation for "Deep learning" which is of course not the case.
  - "improved the algorithms efficiency..." might not be the correct formulation. Deep learning is a technique which has led to a huge number of efficient models. It has not improved algorithms as algorithms are the tools       to realize the theoretical invented models.
l. 12:  "does not need any training and uses the default ImageNet training"
  - this part is very confusing. As far as I understand, the approach does indeed not need additional training as it is applied on an existing model. But this core model has to be trained. Thus DR21 is an extenstion that does not need "additonal" training. Please rephrase the part regarding "default ImageNet" training. I think you want to say that the Deep Learning model just has to be initialized with pre-trained weights that are generated using the ImageNet dataset?
l. 14: "always in the within the...": Please correct this part.
l. 19: On the github page that is provided by the link, I am only able to find the code for DeepRare2019 and earlier versions. I presume that you will upload the updated method when your work is published?

1. Visual attention: deep learning trouble

l. 23 "The" Human visual system...
   - Can you provide a reference?
l. 34 Since the early 2000,...
  - Is the are reference you can cite to undermine this statement?

2. DeepRare2021 model: digging into rare deep features

- Figure 2. What is the first row showing? Is it visualizing the "Data Fusion" from Section 2.5? The same question also holds for the other figures that show examples of saliency maps.
- l. 216 "needs to be applied an important amount of time."
  - What you mean with 'important amount of time'?
- l. 240: As far as I understand you didn't conduct eye-tracking experiments by yourself but the datasets used for evaluation are providing this data? If this is correct, please write this more clearly to avoid confusion.
- l. 249: What is "power 3"?

 

Author Response

Thank you for the comments which are all interesting ! We also apologize for the time we needed for the responses as vacations were in the middle. We uploaded a file with detailed responses. Best !

Author Response File: Author Response.pdf

Back to TopTop