Next Article in Journal
An Efficient Hidden Markov Model with Periodic Recurrent Neural Network Observer for Music Beat Tracking
Previous Article in Journal
An Uncalibrated Image-Based Visual Servo Strategy for Robust Navigation in Autonomous Intravitreal Injection
 
 
Article
Peer-Review Record

Detail-Aware Deep Homography Estimation for Infrared and Visible Image

Electronics 2022, 11(24), 4185; https://doi.org/10.3390/electronics11244185
by Yinhui Luo, Xingyi Wang *, Yuezhou Wu and Chang Shu
Reviewer 1: Anonymous
Reviewer 2:
Electronics 2022, 11(24), 4185; https://doi.org/10.3390/electronics11244185
Submission received: 24 October 2022 / Revised: 9 December 2022 / Accepted: 10 December 2022 / Published: 14 December 2022
(This article belongs to the Section Computer Science & Engineering)

Round 1

Reviewer 1 Report (Previous Reviewer 2)

This article is now recommended for the possible publication from my side as the required changes have been done. 

Author Response

Please see the attachment.

Reviewer 2 Report (New Reviewer)

The paper presents an interesting study and if free of serious mistakes. Only some minor upgrades are needed.

 

The first paragraph of section 3. Should not be highlighted in yellow, as well as yellow parts in section 4

 

The network architecture in Fig2 is interesting, but the colors meaning should be further defined, as they are unclear.

 

The evaluation metrics are mentioned but not defined. Also, the metric formulation specifically for this research should be described - how were the results calculated for the experiment?

 

The authors provide a detailed view of the results of the methods but do not comment on what dataset it has been evaluated (where they get the data, what is the number of images in subsets/categories, how many different cases have been evaluated, is the data available?). Detailed information on used images of each resolution and variant should be provided (i.e., a table).

The material under the DOI: 10.1007/978-3-319-99981-4_16 may be considered for referencing.

Best wishes

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.


Round 1

Reviewer 1 Report

Authors proposed homography of infrared and visible images using deep learning methods and compared the performance with previous work. The detail analysis and measured performance comparison is very good. English grammar looks fine. It is hard to find some missing parts. Thus, the manuscript could be minor revision.

 

1. Please use abbreviated journal names in the reference section

2. Labels for Figures 5 are too small to be seen.

3. Labels in Figure 10 need to be clearer.

4. Authors had better summarize important measured results in the conclusion section.

5. Data availability and author contribution sections are missing.

6. Please delete Lines of 476 and 477.

Author Response

Response Letter

Manuscript ID: Electronics-1930712

 

We highly appreciate the reviewers’ comments and suggestions. We have carefully addressed all the review comments and improved the quality of this manuscript accordingly. Please find below our response to the review comments.

Response to comments of Reviewer 1

The article is interesting and generally, it deserves to be published with some revisions.

 

Response: Thank you for your comments.

 

 

 

  • Comment 1

Please use abbreviated journal names in the reference section.

 

 

Response: Thank you for your suggestion. We have updated the journal names in the reference section to an abbreviated form as follows:

  1. Pan, N.; A sensor data fusion algorithm based on suboptimal network powered deep learning. Alex. Eng. J. 2022, 61, 7129-7139.

 

  1. Zhong, Z.; Gao, W.; Khattak A.M.; Wang M. A novel multi-source image fusion method for pig-body multi-feature detection in NSCT domain. Multimed. Tools Appl. 2020, 79, 26225-26244.

 

  1. Xu, H.; Ma, J.; Jiang, J.; Guo, X.; Ling, H. U2Fusion: A unified unsupervised image fusion network. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 502-518.

 

  1. Zhang, H.; Xu, H.; Tian, X.; Jiang, J.; Ma, J. Image fusion meets deep learning: A survey and perspective. Inf. Fusion 2021, 76, 323-336.

 

  1. Ding, W.; Bi, D.; He, L.; Fan, Z. Infrared and visible image fusion method based on sparse features. Infrared Phys. Technol. 2018, 92, 372-380.

 

  1. Ma, J.; Ma, Y.; Li, C. Infrared and visible image fusion methods and applications: A survey. Inf. Fusion 2019, 45, 153-178.

 

  1. Cai, H.; Zhuo, L.; Chen, X.; Zhang, W. Infrared and visible image fusion based on BEMSD and improved fuzzy set. Infrared Phys. Technol 2019, 98, 201-211.

 

  1. Li, J.; Huo, H.; Li, C.; Wang, R; Feng, Q. AttentionFGAN: Infrared and visible image fusion using attention-based generative adversarial networks. IEEE Trans. Multimedia 2020, 23, 1383-1396.

 

  1. Ma, J.; Tang, L.; Xu, M.; Zhang, H.; Xiao, G. STDFusionNet: An infrared and visible image fusion network based on salient target detection. IEEE Trans. Instrum. Meas. 2021, 70, 1-13.

 

  1. Xu, H.; Wang, X.; Ma, J. DRF: Disentangled representation for visible and infrared image fusion. IEEE Trans. Instrum. Meas. 2021, 70, 1-13.

 

  1. Zhang, H.; Yuan, J.; Tian, X.; Ma, J. GAN-FM: Infrared and visible image fusion using GAN with full-scale skip connection and dual Markovian discriminators. IEEE Trans. Comput. Imaging 2021, 7, 1134-1147.

 

  1. Chen, J.; Li, X.; Luo, L.; Mei, X.; Ma, J. Infrared and visible image fusion based on target-enhanced multiscale transform decomposition. Inf. Sci. 2020, 508, 64-78.

 

  1. Long, Y.; Jia, H.; Zhong, Y.; Jiang, Y.; Jia, Y. RXDNFuse: A aggregated residual dense network for infrared and visible image fusion. Inf. Fusion 2021, 69, 128-141.

 

  1. Lan, X.; Ye, M.; Shao, R.; Zhong, B.; Yuen, P.C.; Zhou, H. Learning modality-consistency feature templates: A robust RGB-infrared tracking system. IEEE Trans. Ind. Electron. 2019, 66, 9887-9897.

 

  1. Wang, C.; Wang, X.; Bai, X.; Liu Y.; Zhou J. Self-supervised deep homography estimation with invertibility constraints. Pattern Recognit. Lett. 2019, 128, 355-360.

 

  1. Nie, L.; Lin, C.; Liao, K.; Liu, M.; Zhao, Y. A view-free image stitching network based on global homography. J. Vis. Commun. Image Represent. 2020, 73, 102950.

 

  1. Ma, J.; Zhao, J.; Jiang, J.; Zhou, H.; Guo X. Locality preserving matching. Int. J. Comput. Vis. 2019, 127, 512-531.

 

  1. Fischler, M.A.; Bolles, R.C. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 1981, 24, 381-395.

 

  1. Nguyen, T.; Chen, S.W.; Shivakumar, S.S.; Taylor C.J.; Kumar, V. Unsupervised deep homography: A fast and robust homography estimation model. IEEE Robot. Autom. Lett. 2018, 3, 2346-2353.

 

  1. Wang, Z.; Bovik. A.C.; Sheikh, H.R.; Simoncelli E.p. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600-612.

 

  1. Davis, J.W.; Sharma, V. Background-subtraction using contour-based fusion of thermal and visible imagery. Comput. Vis. Image Underst. 2007, 106, 162-182.

 

 

  • Comment 2

Labels for Figures 5 are too small to be seen.

 

 

Response:  Thank you for your comments. We have modified the label size in Figure 5. See attach.

 

 

 

  • Comment 3

Labels in Figure 10 need to be clearer.

 

 

Response:  Thank you for your comments. We have modified the label size in Figure 10. See attach.

 

 

 

  • Comment 4

Authors had better summarize important measured results in the conclusion section.

 

 

Response: Thank you very much for this point. As suggested, we have added important measured results in the conclusion section. In particular, our method improves PME [37] and AFRR by 3.94% and 45.46%, respectively, compared with the suboptimal method CADHN [37] on the real dataset.

 

 

  • Comment 5

Data availability and author contribution sections are missing.

 

 

Response: Thank you for your comments. We have added data availability and author contribution section as follows:

Author Contributions: Conceptualization, Y.L. and X.W.; methodology, Y.W.; software, C.S.; validation, X.W., Y.W. and C.S.; formal analysis, Y.L.; investigation, Y.W.; writing—original draft preparation, Y.L. and X.W.; writing—review and editing, Y.L., Y.W., Y.W. and C.S.; project ad-ministration, Y.L.; funding acquisition, Y.L and Y.W. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement: Not applicable.

 

 

  • Comment 6

Please delete Lines of 476 and 477.

 

 

Response: Thank you for your comments. We have deleted lines 476 and 477.

 

 

Author Response File: Author Response.pdf

Reviewer 2 Report

1) In section 4.1.1, hyperlink should not be mentioned.

2) Captions of all the figures are too large...Please shorten them.

3) In section 4.3, try to show Qualitative Comparison and Quantitative Comparison in the form of tables.

4) Why 'adaptive feature point error' has been used for the extraction of feature points in the proposed work ? Explain the practical concept.

5) Related works should be more relevant to the proposed work and also it should be little vast. But the literature survey mentioned by the author is too small.

6) In TABLE 2, never mention Ours. Rather mention 'Proposed' (instead of 'Ours').

7) Is it possible to calculate MSE (Mean Square Error) in this research ? If Yes, calculate it and mention in this work.

8) Kindly use the following relevant reference in your work :

Goyal, B., Dogra, A., Khoond, R., Gupta, A., & Anand, R. (2021, September). Infrared and Visible Image Fusion for Concealed Weapon Detection using Transform and Spatial Domain Filters. In 2021 9th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions)(ICRITO) (pp. 1-4). IEEE.

9) Suggest any method by which the proposed technique can be extended in the future.

  

 

 

 

 

Author Response

Response Letter

Manuscript ID: Electronics-1930712

 

We highly appreciate the reviewers’ comments and suggestions. We have carefully addressed all the review comments and improved the quality of this manuscript accordingly. Please find below our response to the review comments.

Response to comments of Reviewer 2

The article is interesting and generally, it deserves to be published with some revisions.

 

Response: Thank you for your comments.

 

 

 

  • Comment 1

In section 4.1.1, hyperlink should not be mentioned.

 

 

Response: Thank you for your comments. We have removed the hyperlink in section 4.1.1 and cited it as a reference as follows:

We will make our synthetic benchmark datasets from publicly registered infrared and visible datasets such as OSU Color-Thermal Database [50], INO [51], and TNO [52].

  1. INO’s Video Analytics Dataset. https ://www.ino.ca/en/technologies/video-analytics-dataset/ (accessed on 6 September 2022).

 

  1. Toet, A. TNO Image Fusion Dataset. https://doi.org/10.6084/m9.figshare.1008029.v1 (accessed on 6 September 2022).

 

 

  • Comment 2

Captions of all the figures are too large...Please shorten them.

 

 

Response:  Thank you for your comments. We have shortened the caption size of all the figures.

 

 

  • Comment 3

In section 4.3, try to show Qualitative Comparison and Quantitative Comparison in the form of tables.

 

 

Response:  Thank you very much for this point. We have compared our method with others in Section 4.3, in which the qualitative comparison is shown in Figure 6, and the quantitative comparison is shown in Table 3.

 

 

  • Comment 4

Why 'adaptive feature point error' has been used for the extracti in the practical concept.

 

 

Response: Thank you very much for this comments. After careful inspection, we believe that replacing "Adaptive Feature Point Error" with "Adaptive Feature Registration Rate" can express the evaluation index proposed more accurately. We added the concept of "Adaptive Feature Registration Rate" in Section 3.3 as follows:

According to the above observations, we directly use SIFT to adaptively extract feature points from another perspective and use the ratio of more accurate feature points as the evaluation value to obtain the Adaptive Feature Registration Rate (AFRR). At the same time, the use of feature points for evaluation calculation can effectively avoid problems such as registration performance and manual annotation workload that are difficult to reflect in other evaluation indicators.

 

 

  • Comment 5

Related works should be more relevant to the proposed work and also it should be little vast. But the literature survey mentioned by the author is too small.

 

 

Response: Thank you for your comments. After a literature search, we found that the number of relevant pieces of literature on homography estimation is relatively small, so we add four pieces of literature to the deep homography in Section 2. Meanwhile, we briefly compare the most relevant literature with our method at the end of the related works as follows:

Le et al. [38] learn from image pairs with ground-truth homography, which are hard to obtain in practical applications. Nie et al. [45] propose to predict multi-grid homogra-phy from global to local to address parallax in images. Shao et al. [46] used a trans-doemer structure to address the cross-resolution problem in homography estimation. Inspired by Zhang et al., Ye et al. [39] proposed a homography flow representation to reduce feature rank and suppress motion noise. However, due to the large grayscale and contrast differences between the infrared and visible images, the homography flow is unstable, making it difficult for the network to converge. Similarly, Hong et al. [47] also used homography flow to obtain homography matrices, which would be dif-ficult to apply to infrared and visible light scenarios.

 

Discussions. A closely related work to our is [37], the authors consider using a feature extractor consisting of three layers of convolutions to learn deep features in images and utilize masks to select only reliable regions for homography estimation. Additionally, a Triplet Loss is formulated to enable unsupervised learning. Compared to [37], our work considers the importance of details in the image and will retain more details from three aspects. First, RDN [42] is introduced to obtain dense features in the image. Second, CBAM [48] is introduced to refine the features in both channel and space dimensions and transform the location of the attention. Finally, the proposed DFL directly utilizes the refined features to participate in the loss computation.

 

  1. Le, H.; Liu, F.; Zhang, S.; Agarwala, A. Deep homography estimation for dynamic scenes. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 14-19 June 2020; pp. 7652-7661.

 

  1. Nie L, Lin C, Liao K, et al. Depth-Aware Multi-Grid Deep Homography Estimation with Contextual Correlation[J]. arXiv preprint arXiv:2107.02524, 2021.

 

  1. Shao R, Wu G, Zhou Y, et al. Localtrans: A multiscale local transformer network for cross-resolution homography estima-tion[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 14890-14899.

 

  1. Hong, M.; Lu, Y.; Ye, N.; Lin, C.; Zhao, Q.; Liu S. Unsupervised Homography Estimation with Coplanarity-Aware GAN. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 19-24 June 2022; 17663-17672.

 

 

  • Comment 6

In TABLE 2, never mention Ours. Rather mention 'Proposed' (instead of 'Ours').

 

 

Response: Thank you for your comments. We have replaced 'Ours' with 'Proposed method' in Figure 1, Figure 6, Table 2, Table 3.

 

 

  • Comment 7

Is it possible to calculate MSE (Mean Square Error) in this research ? If Yes, calculate it and mention in this work.

 

 

Response: Thank you for your comments. We have explained the reasons for not using MSE in Section 4.2 as follows:

We used evaluation metrics such as SSIM, MI, PSNR, ACE [38], and AFRR in the quantitative comparison. Since MSE is calculated similarly to ACE [38] and PME [37], we will not use it as our evaluation metric in the future.

 

 

  • Comment 8

Kindly use the following relevant reference in your work :

 

Goyal, B., Dogra, A., Khoond, R., Gupta, A., & Anand, R. (2021, September). Infrared and Visible Image Fusion for Concealed Weapon Detection using Transform and Spatial Domain Filters. In 2021 9th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions)(ICRITO) (pp. 1-4). IEEE.

 

 

Response: Thank you for your comments. We have added this reference in the introduction as follows:

The registration task of infrared and visible images is widely used as an essential part of computer vision applications, such as image fusion [13-15] and target tracking [16].

 

  1. Goyal, B.; Dogra, A.; Khoond, R.; Gupta, A.; Anand, R. Infrared and Visible Image Fusion for Concealed Weapon Detection using Transform and Spatial Domain Filter. 2021 9th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions)(ICRITO), Noida, India, 3-4 September 2021; pp: 1-4.

 

 

  • Comment 9

Suggest any method by which the proposed technique can be extended in the future.

 

 

Response: Thank you for your comments. We have expanded on future research work as follows:

In the future, we will further explore AFRR to generalize in multi-source images. At the same time, based on this research, the shallow feature extraction method in multi-source images is further optimized to improve the homography estimation performance.

Author Response File: Author Response.pdf

Back to TopTop