Generation of Stereo Images Based on a View Synthesis Network
Round 1
Reviewer 1 Report
The paper deals with quite interesting topic of generation of stereo images from single image. The paper is well conceived and structured. The proceed is quite well described, so I think that the paper could be published in its presented form.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
The reviewed article concerns the image generation method, which is an image from a set of stereo pairs in stereo vision. The authors assumed that they have an image seen through the left eye and generate an image that would be seen through the right eye. For this purpose, they use a method based on simulation of simultaneous displacement and rotation.
In my opinion, this is a familiar approach, often used in stereovision. So this is not a complete novelty, about which the authors write. The authors use other existing methods, such as already developed neural networks.
The work also lacks well presented research results. Subjective feelings cannot be a confirmation of research (as in section 4.2). It has been noticed that the authors' metadata does not always give better results.
However, I would like to draw attention to the possibilities of the proposed method. It can be seen in the photos in the work that the method gives quite good results of generating new images, especially within clear edges.
Therefore, the improved description of the research part will allow me to accept the work for publication.
My suggestions:
1. Manual labeling is probably not a good idea for these types of methods.
2. Figure 5 could contain some reference lines that illustrate rotation and offset.
3. Figure 4 is not completely understood
4. Table 1 presents the results for "Warping" and "Our approach". It would be good to explain more precisely how CW-SSIM calculations are made.
5. In section 4.2 the results are presented only in the form of photos. Maybe an interesting element of the visualization would be to show the resulting image from the difference between the images of "Warping and" Our "? Maybe there are more differences?
6. Scaling images always introduces some errors. It was not possible to change the scale of the network? It would make it necessary to re-learn the network, but maybe the results would be even better.
7. You include 38 categories. The use of 8233 training data gives just over 200 data per category. With networks it is probably a fairly small set.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
The paper proposes an interesting method to estimate a stereo image pair given a single image and considers both translation and rotation of the objects in the scene.
I have the following questions/suggestions:
1. Please provide some more context on the equation in line 223.
2. The image qualities are low to observe artifacts or subtle improvements. Please include hi-res images in future versions of the paper.
3. Line 310 looks incomplete, I did not understand: "we could the searching region to"?
4. [Fig. 6] Please find a way to show details, e.g. zooming. Use the full width of the paper to place them as necessary.
5. Not clear how you selected the 10 pairs from the dataset. Looks like you picked 5 perfect and 5 imperfect not from the same 5 sets, but from 10 sets. Please explain why?
6. [Sec 4, Results] 2-3 minutes is a long time. Provide time profiling. I want to see into some quantification of the time takes in the most computationally expensive operations. The description is too brief.
7. What stopped you from trying other datasets? Give proper reasons in the paper.
8. [Table 1] Please provide some more insight into the score differences between perfect/imperfect pairs.
9. [Fig. 7, Fig. 8] Really hard to analyze. Please help me by showing where to look at! Some heatmap like visualization for errors or any other way for emphasis on the region of interest will help.
10. [Fig. 10] The blue boxes look green (due to compression?). Maybe you can draw a thicker border to avoid any confusion.
11. [Fig. 11] No need to superimpose the cropped regions, instead, put them the side (enough room out there in the blank spaces in the side) and refer to the original image using arrows. This will help to visualize them more clearly.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf