Next Article in Journal
Performance of SALP Swarm Localization Algorithm in Underwater Wireless Sensor Networks
Previous Article in Journal
Nanoscale Refractive Index Sensors Based on Fano Resonance Phenomena
 
 
Article
Peer-Review Record

A Semi-Supervised Method for PatchMatch Multi-View Stereo with Sparse Points

Photonics 2022, 9(12), 983; https://doi.org/10.3390/photonics9120983
by Weida Zhan *, Keliang Cao, Yichun Jiang, Yu Chen, Jiale Wang and Yang Hong
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Photonics 2022, 9(12), 983; https://doi.org/10.3390/photonics9120983
Submission received: 7 November 2022 / Revised: 7 December 2022 / Accepted: 7 December 2022 / Published: 14 December 2022

Round 1

Reviewer 1 Report

This is an interesting idea to use a learning-based method for Multi-view Stereo Matching. As the main contributions, the Authors declare (1) a new sparse semi-supervised stereo matching framework called SGT-PatchMatchNet (2) propose Photometric Similarity Loss to improve the performance of 3D reconstruction (3) propose Robustness Consistency Loss to improve the integrity and robustness of occlusion and edge areas

The innovation of this method is clearly highlighted and explained. However, the Literature review needs to be improved. I hope the following comments will be useful for improving the article.

 

 

The title does not appear to be fitted perfectly with all of the contributions. I would suggest to paraphrase precisely to support the contributions and experiments.

 

The literature only reports a few works from previous works (not recent works – I see only 2 article from 2022).

 

Caption for Figures 1, 2, 5 need to be complete. Details need to be mentioned for each figure. So the reader can understand better the message of figure.

 

Equations 1 and 2, use the correct form of symbols – (*, ×, . ) – please check this link for more info

(https://www.techtarget.com/searchdatacenter/definition/Mathematical-Symbols)

All of the parameters in each equation must be explained. For example, in equation 2, L, L1, L2.

 

Some Equations are expressed without reference which means it is an original idea provided by this research. Please check throughout the paper, if references are needed for Equations.

 

 

I would also suggest having more metric tests to evaluate better the generated/predicted 3D reconstruction using a cloud-to-cloud comparison or cross-section since you have reference data for each dataset.

 

 

Author Response

Please see the attachment, thanks.

Author Response File: Author Response.docx

Reviewer 2 Report

In this paper, a deep learning-based MVS method is proposed. This method is based on the PatchMatchNet (CVPR2021). A new semi-supervised MVS method with sparse depth information is proposed and named as SGT-PatchMatchNet. As a quantitative evaluation, good results have been obtained, and there is an advantage that the testing speed is fast. However, this paper lacks explanation as a whole. Furthermore, the proposed method cannot be evaluated fairly because the evaluation method is inappropriate. Specific problems are shown below.

 

(1) Sparseness

 

The biggest feature of the proposed method is that it uses only sparse 3D points without using dense ground truth depth information. However, it does not clearly explain how sparse it is.

 

In the introduction, it is described that there are originally 1.92 x 10^6 3D points, but they will be reduced to between 256 and 300. This seems a dramatic decrease. However, in Section 3.3.1 it is explained as 1/20 x H x W. Also, in Section 4.1 it is explained as 1/20 x H x W and 3/20 x H x W. These relationships are unclear. If it is 1/20 or 3/20, it is still dense. The experimental section does not give specific numbers.

 

It is also unclear what criteria should be used to select sparse 3D points. It is unclear how they are spatially distributed. It is necessary to clarify the specific criteria while illustrating the distribution as shown in Figure 1(b) of SGT-MVSNet (ICCV2021).

 

(2) Unclear experimental results

 

Figures 4, 5, and 6 compare the reconstruction results of each method. However, the poses (camera positions) seem to be different, so direct comparison is not possible. Furthermore, the image is very grainy and doesn't look like it has 640 x 512 resolution. The graph in Fig. 7 is also a compressed low-resolution raster image, so the characters are difficult to see.

 

Figure 8 in Section 4.3 also compares the experimental results for high-resolution data, but it does not look like 1920 x 1056 resolution. In order to see the characteristics of the proposed method, it is necessary to check the blur around the edge, but the image is already blurred and cannot be evaluated.

 

(3) The number of stages

 

Figure 1 shows the structure of SGT-PatchMatchNet. Compared to the original PatchMatchNet, the number of stages is one less. This change is explained only as `we only perform two iterations and one optimization to complete the training of the whole network', and there is no clear explanation as to why. If various changes are made, it will be difficult to know which part contributes to the improvement of the computation speed. Table 1 shows that the testing speed of the proposed method is the fastest. The fact that the number of stages has decreased may also be the reason.

 

(4) SparsePatchMatch

 

Figure 2 shows the flow of SparsePatchMatch. The initialization process that was present in the original PatchMatch is omitted. However, it doesn't seem to explain how to initialize. Also, as an input, the reference features are not listed. It is unclear whether SparsePatchMatch does not use reference features as input.

 

(5) Term

 

It is unclear how the terms SGT-PatchMatchNet and SGT-PatchMatch are used properly. Is `problems of SGT-PatchMatchNet' on line 94 correct?

 

Author Response

Please see the attachment, thanks.

Author Response File: Author Response.docx

Reviewer 3 Report

This manuscript proposes a sparse semi-supervised stereo matching method, called SGT-PatchMatchNet. Overall, the manuscript is well-organized and written, with somewhat incremental novelty.

 

Some suggestions:

1.      The Photometric similarity loss is not a newly proposed constraint. It is unsuitable to declare that the authors “propose” …

2.      In Eq. (5), why the weights are set as \lambda_1 = 0.456 and \lambda_2 = 0.512? Is it set by experiment performance or others?

3.      SGT-PatchMatchNet, what SGT stands for? SGT should be explained before the first time used.

4.      The resolution of each image in DTU dataset is 1600 X 1200, why the authors use the size of 640 X 540 for training and testing?

5.      It would be good to add some photometric stereo paper to close the subject of the special issue, such as:

[1] Kaya B, Kumar S, Oliveira C, et al. Uncalibrated neural inverse rendering for photometric stereo of general surfaces[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 3804-3814.

[2] Ju Y, Shi B, Jian M, et al. NormAttention-PSN: A High-frequency Region Enhanced Photometric Stereo Network with Normalized Attention[J]. International Journal of Computer Vision, 2022, 130(12): 3014-3034.

[3] Honzátko D, Türetken E, Fua P, et al. Leveraging Spatial and Photometric Context for Calibrated Non-Lambertian Photometric Stereo[C]//2021 International Conference on 3D Vision (3DV). IEEE, 2021: 394-402.

[4] M. Jian, Learning the Traditional Art of Chinese Calligraphy via Three-Dimensional Reconstruction and Assessment. IEEE Trans. Multimedia, 22(4): 970-979, 2020.

6.      In experiments, please add more comparisons, with traditional non-learning methods, and newly proposed learning-based methods, such as [3] [5] mentioned in references, to show the performance of the proposed method. Also, the authors should add the citation index in Table 1 and Section 4.2.

Author Response

Please see the attachment, thanks.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

accept 

Author Response

Thank you for accepting.

Reviewer 2 Report

If the specifications of the software used do not allow the images in Fig. 5 to be at the same angle, I think that this should be clearly stated in the paper.

Author Response

Please see the attachment, thanks.

Author Response File: Author Response.docx

Reviewer 3 Report

I have seen the revised manuscript by the authors. The explanations should be also written in the manuscript, not only the response letter. For example, the experimental setting of the weight in Eq. (5).

The authors should show the basis (maybe an ablation experiment) and details (at least mentioned in the manuscript). Also, some reference papers I mentioned still lacks, e.g. Incorporating lambertian priors into surface normals measurement, Learning the traditional art of Chinese calligraphy via three-dimensional reconstruction and assessment. Furthermore, according to Table 1, it seems that the traditional methods COLMAP achieves highest Acc, and the overall metrics is worse than [11], I think the authors should give a discussion and analysis in the manuscript.

Author Response

Please see the attachment, thanks.

Author Response File: Author Response.docx

Back to TopTop