Author Contributions
Conceptualization, P.K. and M.M.; methodology, P.K., M.M.; software, P.K., M.M.; validation, P.K., investigation, P.K., M.M.; writing—original draft preparation, P.K.; writing—review and editing, M.M., B.G., K.P.; supervision, M.M., B.G., K.P.; project administration, B.G., K.P.; funding acquisition, M.M., B.G., K.P., P.K. All authors have read and agreed to the published version of the manuscript.
Figure 1.
Processing for Layer 1. This processing is iterated for all interesting layers from a CNN encoder network. In this example, 13 layers are in total chosen for a VGG16 encoder.
Figure 1.
Processing for Layer 1. This processing is iterated for all interesting layers from a CNN encoder network. In this example, 13 layers are in total chosen for a VGG16 encoder.
Figure 2.
Rarity image reconstruction adapted from [
9]. Rarity function (green curve in the left graph) is computed from a histogram (blue curve) of a feature/activation map (middle image). The output is a reconstruction of the map where high values are given for the most “rare” areas (right image).
Figure 2.
Rarity image reconstruction adapted from [
9]. Rarity function (green curve in the left graph) is computed from a histogram (blue curve) of a feature/activation map (middle image). The output is a reconstruction of the map where high values are given for the most “rare” areas (right image).
Figure 3.
Detailed maps of different levels (from Low Level 1 to High Level) and different thresholds on feature rarity (from 0.9 to no threshold) within the VGG16 architecture.
Figure 3.
Detailed maps of different levels (from Low Level 1 to High Level) and different thresholds on feature rarity (from 0.9 to no threshold) within the VGG16 architecture.
Figure 4.
Detailed maps of different levels (from Low Level 1 to High Level) and different thresholds on feature rarity (from 0.9 to no threshold) within the VGG16 architecture.
Figure 4.
Detailed maps of different levels (from Low Level 1 to High Level) and different thresholds on feature rarity (from 0.9 to no threshold) within the VGG16 architecture.
Figure 5.
Detailed maps of different levels (from Low Level 1 to High Level) and different thresholds on feature rarity (from 0.9 to no threshold) within the VGG16 architecture.
Figure 5.
Detailed maps of different levels (from Low Level 1 to High Level) and different thresholds on feature rarity (from 0.9 to no threshold) within the VGG16 architecture.
Figure 6.
Details on the fusion techniques. The last step using map #105 is optional and only makes sense for a VGG16 architecture.
Figure 6.
Details on the fusion techniques. The last step using map #105 is optional and only makes sense for a VGG16 architecture.
Figure 7.
Selected samples P3 dataset. From left to right: target difference in color, orientation, and size. From top to down: initial, ground truth, RARE2012, MLNET, SALICON, DR2019, DR2021.
Figure 7.
Selected samples P3 dataset. From left to right: target difference in color, orientation, and size. From top to down: initial, ground truth, RARE2012, MLNET, SALICON, DR2019, DR2021.
Figure 8.
Selected samples O3 dataset. From top to down: initial, ground truth, RARE2012, MLNET, SALICON, DR2019, DR2021.
Figure 8.
Selected samples O3 dataset. From top to down: initial, ground truth, RARE2012, MLNET, SALICON, DR2019, DR2021.
Figure 9.
Selected samples MIT1003 dataset. From top to down: initial, ground truth, RARE2012, SALICON, DR2019, DR2021.
Figure 9.
Selected samples MIT1003 dataset. From top to down: initial, ground truth, RARE2012, SALICON, DR2019, DR2021.
Figure 10.
Selected samples OSIE dataset. From top to down: initial, ground truth, GBVS, SAM-ResNet, FAPTTX [
27], DR2019, DR2021.
Figure 10.
Selected samples OSIE dataset. From top to down: initial, ground truth, GBVS, SAM-ResNet, FAPTTX [
27], DR2019, DR2021.
Figure 11.
Number of fixations (horizontal axis) vs. % of targets detected (vertical axis). It is chosen on 15, 25, 50, and 100 fixations.
Figure 11.
Number of fixations (horizontal axis) vs. % of targets detected (vertical axis). It is chosen on 15, 25, 50, and 100 fixations.
Figure 12.
The GSI score for color target/distractor difference. Best classical (dotted red line) and best deep-based (dotted green line) along with the DeepRare family models.
Figure 12.
The GSI score for color target/distractor difference. Best classical (dotted red line) and best deep-based (dotted green line) along with the DeepRare family models.
Figure 13.
The GSI score for the orientation target/distractor difference. Best classical (green dotted line) and deep learning (red dotted line) along with the DeepRare family models.
Figure 13.
The GSI score for the orientation target/distractor difference. Best classical (green dotted line) and deep learning (red dotted line) along with the DeepRare family models.
Figure 14.
The GSI score for size target/distractor size ratio. Best classical (red dotted line) and deep learning models (green dotted line) along with the DeepRare family models.
Figure 14.
The GSI score for size target/distractor size ratio. Best classical (red dotted line) and deep learning models (green dotted line) along with the DeepRare family models.
Table 1.
The OSIE dataset. Tests with different rarity thresholds, both face and without face features on VGG16.
Table 1.
The OSIE dataset. Tests with different rarity thresholds, both face and without face features on VGG16.
VGG16 | With Face | Without Face |
---|
Thresholds | CC | CC |
0 | 0.55 | 0.53 |
0.9 | 0.56 | 0.55 |
(0 + 0.9)/2 | 0.57 | 0.56 |
(0.4 + 0.9)/2 | 0.57 | 0.56 |
Table 2.
The MIT1003 dataset. Tests with different rarity thresholds, both face and without face features on VGG16.
Table 2.
The MIT1003 dataset. Tests with different rarity thresholds, both face and without face features on VGG16.
VGG16 | With Face | Without Face |
---|
Thresholds | CC | CC |
0 | 0.47 | 0.46 |
0.9 | 0.45 | 0.43 |
(0 + 0.9)/2 | 0.48 | 0.47 |
(0.4 + 0.9)/2 | 0.47 | 0.45 |
Table 3.
The OSIE dataset. Tests on threshold 0 and 0.9 by considering the saliency map without filtering, with filtering, and with filtering and squared.
Table 3.
The OSIE dataset. Tests on threshold 0 and 0.9 by considering the saliency map without filtering, with filtering, and with filtering and squared.
VGG16 | With Face | Without Face |
---|
(0 + 0.9)/2 | CC | CC |
Not filtered | 0.54 | 0.53 |
Filtered | 0.57 | 0.56 |
Filtered + squared | 0.59 | 0.58 |
Table 4.
The MIT1003 dataset. Tests on threshold 0 and 0.9 by considering the saliency map without filtering, with filtering, and with filtering and squared.
Table 4.
The MIT1003 dataset. Tests on threshold 0 and 0.9 by considering the saliency map without filtering, with filtering, and with filtering and squared.
VGG16 | With Face | Without Face |
---|
(0 + 0.9)/2 | CC | CC |
Not filtered | 0.43 | 0.42 |
Filtered | 0.48 | 0.47 |
Filtered + squared | 0.51 | 0.50 |
Table 5.
DR19 versus DR21.
Table 5.
DR19 versus DR21.
Features | DR19 | DR21 |
---|
Deep features rarity | yes | yes |
Several architectures | no | yes |
Rarity thresholds | no | yes |
Use of several rarity thresholds | no | yes |
Post processing | yes | yes |
Table 6.
MIT1003 dataset. DeepRare2021 (VGG19: DR21-V19, VGG16 without faces: DR21-V16-WF, VGG16 with faces: DR21-V16, MobileNetV2: DR21-MN2), DeepRare2019 (VGG16: DR19-V16), DeepRare2019 (VGG16 without faces: DR19-V16-WF, VGG16 with faces: DR19-V16), DFeat, eDN, GBVS, RARE2012, BMS, AWS results come from [
30] and SALICON and MLNet come from [
29].
Table 6.
MIT1003 dataset. DeepRare2021 (VGG19: DR21-V19, VGG16 without faces: DR21-V16-WF, VGG16 with faces: DR21-V16, MobileNetV2: DR21-MN2), DeepRare2019 (VGG16: DR19-V16), DeepRare2019 (VGG16 without faces: DR19-V16-WF, VGG16 with faces: DR19-V16), DFeat, eDN, GBVS, RARE2012, BMS, AWS results come from [
30] and SALICON and MLNet come from [
29].
Models | AUCJ ↑ | AUCB ↑ | CC ↑ | KL ↓ | NSS ↑ | SIM ↑ |
---|
DR21-V19 | 0.86 | 0.85 | 0.56 | 0.88 | 1.93 | 0.50 |
DR21-V16 | 0.84 | 0.83 | 0.50 | 1.19 | 1.81 | 0.43 |
DR21-V16-WF | 0.84 | 0.83 | 0.49 | 1.16 | 1.75 | 0.42 |
DR21-MN2 | 0.84 | 0.83 | 0.50 | 1.14 | 1.71 | 0.42 |
DR19-V16 | 0.86 | 0.85 | 0.48 | 1.25 | 1.58 | 0.36 |
DR19-V16-WF | 0.84 | 0.83 | 0.46 | 1.32 | 1.54 | 0.34 |
SALICON | 0.83 | - | 0.51 | 1.12 | 1.84 | 0.41 |
MLNet | 0.82 | - | 0.46 | 1.36 | 1.64 | 0.35 |
DFeat | 0.86 | 0.83 | 0.44 | 1.41 | - | - |
eDN | 0.86 | 0.84 | 0.41 | 1.54 | - | - |
GBVS | 0.83 | 0.81 | 0.42 | 1.3 | - | - |
RARE2012 | 0.75 | 0.77 | 0.38 | 1.41 | - | - |
BMS | 0.75 | 0.77 | 0.36 | 1.45 | - | - |
AWS | 0.71 | 0.74 | 0.32 | 1.54 | - | - |
Table 7.
OSIE dataset. DeepRare2021 (VGG19: DR21-V19, VGG16 without faces: DR21-V16-WF, VGG16 with faces: DR21-V16, MobileNetV2: DR21-MN2), DeepRare2019 (VGG16: DR19-V16), DeepRare2019 (VGG16 without faces: DR19-V16-WF, VGG16 with faces: DR19-V16), and SAM-ResNet, FAPTTX, RARE2012, AWS, GBVS, and AIM come from [
27]. We added DeepRare2021 with VGG16 and top-down from [
27] called DR21-V16+TD.
Table 7.
OSIE dataset. DeepRare2021 (VGG19: DR21-V19, VGG16 without faces: DR21-V16-WF, VGG16 with faces: DR21-V16, MobileNetV2: DR21-MN2), DeepRare2019 (VGG16: DR19-V16), DeepRare2019 (VGG16 without faces: DR19-V16-WF, VGG16 with faces: DR19-V16), and SAM-ResNet, FAPTTX, RARE2012, AWS, GBVS, and AIM come from [
27]. We added DeepRare2021 with VGG16 and top-down from [
27] called DR21-V16+TD.
Models | AUCJ ↑ | AUCB ↑ | CC ↑ | KL ↓ | NSS ↑ | SIM ↑ |
---|
SAM-ResNet | 0.90 | - | 0.77 | 1.37 | 3.1 | 0.65 |
DR21-V16+TD | 0.88 | 0.83 | 0.66 | 0.83 | 2.32 | 0.56 |
FAPTTX | 0.87 | - | 0.62 | 0.81 | 2.08 | 0.51 |
DR21-V16 | 0.87 | 0.86 | 0.59 | 0.91 | 2.06 | 0.52 |
DR21-V16-WF | 0.87 | 0.86 | 0.58 | 0.84 | 2.01 | 0.51 |
DR19-V16 | 0.87 | 0.86 | 0.55 | 0.98 | 1.75 | 0.44 |
DR19-V16-WF | 0.86 | 0.86 | 0.53 | 1.01 | 1.66 | 0.43 |
DR21-MN2 | 0.85 | 0.84 | 0.51 | 1.06 | 1.55 | 0.42 |
DR21-V19 | 0.83 | 0.82 | 0.45 | 1.32 | 1.54 | 0.34 |
RARE2012 | 0.83 | - | 0.46 | 1.05 | 1.53 | 0.43 |
AWS | 0.82 | - | 0.45 | 1.11 | 2.02 | 0.42 |
GBVS | 0.81 | - | 0.43 | 1.08 | 1.34 | 0.42 |
AIM | 0.77 | - | 0.32 | 1.52 | 1.07 | 0.34 |
Table 8.
Comparing results between several models (SAM-Resnet, CVS, DeepGaze II, FES, ICF, and BMS) and DR family (DR19 and DR21 in the version VGG16, VGG19, and MobileNetV2). For MSRt, higher is better, For MSRb, lower is better.
Table 8.
Comparing results between several models (SAM-Resnet, CVS, DeepGaze II, FES, ICF, and BMS) and DR family (DR19 and DR21 in the version VGG16, VGG19, and MobileNetV2). For MSRt, higher is better, For MSRb, lower is better.
Models | Color | | Non-Color | | All Targets | |
---|
| MSRt ↑ | MSRb ↓ | MSRt ↑ | MSRb ↓ | MSRt ↑ | MSRb ↓ |
DR21-V16 | 1.66 | 0.74 | 1.31 | 1.31 | 1.45 | 1.01 |
DR21-V19 | 1.63 | 0.78 | 1.29 | 1.39 | 1.43 | 1.13 |
DR21-MN2 | 1.19 | 1.02 | 1.06 | 1.54 | 1.12 | 1.32 |
DR19 | 1.14 | 0.75 | 1.00 | 1.00 | 1.06 | 0.89 |
SAM-ResNet | 1.47 | 1.46 | 1.04 | 1.84 | 1.40 | 1.52 |
CVS | 1.43 | 2.43 | 0.91 | 4.26 | 1.34 | 2.72 |
DGII | 1.32 | 1.55 | 0.94 | 1.95 | 1.26 | 1.62 |
FES | 1.34 | 2.53 | 0.81 | 5.93 | 1.26 | 3.08 |
ICF | 1.30 | 2.00 | 0.84 | 2.03 | 1.23 | 2.01 |
BMS | 1.29 | 0.97 | 0.87 | 1.59 | 1.22 | 1.07 |
Table 9.
SALICON, MLNet, and DeepRare family (DR19 and DR21 with MobileNetV2, VGG19, and VGG16 architectures) results on the O3 dataset.
Table 9.
SALICON, MLNet, and DeepRare family (DR19 and DR21 with MobileNetV2, VGG19, and VGG16 architectures) results on the O3 dataset.
Models | MSRt ↑ | MSRb ↓ |
---|
DR21-V16 | 1.45 | 1.01 |
DR21-V19 | 1.43 | 1.13 |
DR21-MN2 | 1.12 | 1.32 |
DR19 | 1.06 | 0.89 |
MLNet | 0.96 | 0.91 |
SALICON | 0.90 | 1.26 |
Table 10.
Comparing results on P3 dataset.
Table 10.
Comparing results on P3 dataset.
Models | Avg. # Fix. ↓ | % Found ↑ |
---|
DR21-V16 | 13.53 | 89 |
DR21-V19 | 13.86 | 89 |
DR21-MN2 | 33.82 | 72 |
DR19 | 16.34 | 87 |
MLNet | 42.00 | 44 |
SALICON | 49.37 | 65 |
Table 11.
Comparing results on the P3 dataset. Details on the percentage found after the number of fixation of 15 (%fd15), 25 (%fd25), 50 (%fd50), and 100 (%fd100). Percentage found of the color ((%fd-C), orientation (%fd-O), and size (%fd-S) features taken separately.
Table 11.
Comparing results on the P3 dataset. Details on the percentage found after the number of fixation of 15 (%fd15), 25 (%fd25), 50 (%fd50), and 100 (%fd100). Percentage found of the color ((%fd-C), orientation (%fd-O), and size (%fd-S) features taken separately.
Models | %fd15 | %fd25 | %fd50 | %fd100 | %fd-C | %fd-O | %fd-S |
---|
DR21-V16 | 84.82 | 86.71 | 88.60 | 89.76 | 92.20 | 92.93 | 83.92 |
DR21-V19 | 84.27 | 86.32 | 88.10 | 89.14 | 92.65 | 92.36 | 82.14 |
DR21-MN2 | 61.37 | 64.81 | 69.37 | 72.46 | 77.17 | 71.75 | 68.21 |
DR19 | 80.61 | 83.27 | 86.63 | 87.87 | 91.29 | 89.58 | 82.50 |
RARE2012 | 59.87 | 63.52 | 79.75 | 93.48 | 99.54 | 90.26 | 88.53 |
BMS | 58.94 | 66.37 | 83.56 | 95.14 | 100 | 100 | 82.76 |
ICF | 32.63 | 41.38 | 68.47 | 70.18 | 69.41 | 100 | 42.45 |
oSALICON | 30.25 | 39.75 | 55.45 | 78.53 | 76.35 | 81.58 | 70.42 |
Table 12.
Comparing results on the P3 dataset. Global Saliency Index score on color, orientation, and size features, and average score from these three features.
Table 12.
Comparing results on the P3 dataset. Global Saliency Index score on color, orientation, and size features, and average score from these three features.
Models | GSI-Color | GSI-Orien. | GSI-Size | GSI-Avg. |
---|
DR21-V16 | 0.77 | 0.50 | 0.49 | 0.59 |
DR21-V19 | 0.75 | 0.49 | 0.51 | 0.58 |
DR21-MN2 | 0.66 | 0.42 | 0.51 | 0.53 |
DR19 | 0.42 | 0.17 | 0.15 | 0.25 |
RARE2012 | 0.74 | 0.01 | 0.18 | 0.31 |
BMS | 0.72 | 0.01 | −0.02 | 0.24 |
ICF | 0.18 | −0.02 | −0.51 | −0.12 |
oSALICON | −0.01 | 0.04 | −0.11 | −0.03 |