Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Real-Time Computer Vision for Tree Stem Detection and Tracking

Forests 2023, 14(2), 267; https://doi.org/10.3390/f14020267

by Lucas A. Wells^*,†

and Woodam Chung

Reviewer 1:

Jacek Komorowski

Reviewer 2:

João Marcelo Teixeira

Forests 2023, 14(2), 267; https://doi.org/10.3390/f14020267

Submission received: 28 December 2022 / Revised: 20 January 2023 / Accepted: 27 January 2023 / Published: 31 January 2023

(This article belongs to the Special Issue Forest Harvesting, Operations and Management)

Round 1

Reviewer 1 Report

The paper presents a computer vision solution for tree stem detection and tracking in the forest environment. The paper is well-structured and the the proposed methods are clearly described.

My biggest concern is a lack of comparison of the performance of the proposed methods to some baseline.

The authors proposed a modification to the well-known YOLO object detection method. But the performance of the modified method is not benchmarked against the original method with minimal modifications (reducing the number of detected classes to one). So it's not possible to judge if the proposed modification were beneficial and what was their impact on the final performance.

Also with regard to the proposed tracking method, authors didn't perform any quantitative evaluation and didn't compare the proposed method to some baseline multi-object tracking method based on tracking-by-detection paradigm.

So in the paper there's no evidence of advantages of the proposed methods over off-the-shelf object detector and multi-object tracker.

Minor remarks:

Authors write "In this work, we solve for the parameters of bounding boxes representing tree stems using an algorithm adapted from [19,21,24]."

It would be better, if authors explicitly write that they use a modified version of YOLO detector. So the reader knows the name of the method adapted by authors without the need to look into references section.

Authors write "Using the trained network, we perform bounding box prediction on a new image by resizing it to the dimensions of the network’s first layer.." But technically, the proposed detector has a fully convolutional architecture and it can accept images of varying sizes. The first layer is a convolutional layer and it doesn't specify any fixed image dimension. Later authors write "As mentioned earlier, the CNN is fully convolutional. Thus, we can vary the input image resolutions without retraining the network."

I recommend adding comparisons of the the proposed methods to baseline object detection and multi-object tracking methods.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

The authors propose a computer vision-based approach to perform object detection and tracking tasks in forested environments. The automatic tree measurement and mapping system may help reduce silvicultural treatment costs by eliminating the need for individual and manual tree marking.

The paper is well writen and contain relevant references for its full understanding, although I believe the following references could add to the paper:

- A Two-Stage Approach for Individual Tree Segmentation From TLS Point Clouds

- Topology-based individual tree segmentation for automated processing of terrestrial laser scanning point clouds

- 3D Graph-Based Individual-Tree Isolation (Treeiso) from Terrestrial Laser Scanning Point Clouds

One of the keywords of the paper is stereo vision, but the term "stereo" is mostly used only after page 13. The authors should provide more details regarding the stereo sensor used, because only resolution and baseline are mentioned.

The authors should discuss about the possibilities of using depth cameras to detect and track trees. This should easy the process of tree segmentation due to the pattern the vertical stems present.

The authors do not mention applying rotation to augmented the training. Is rotation supported in the detection, or the camera must be mostly aligned to the horizontal plane on the floor?

The authors should mention details regarding fps in case egomotion of the camera is not available.

The dataset used should be made available (or more information about it) so that readers could be able to reproduce or even improve the work with the same data.

Minor writing errors were found and are listed as follows.

"one of the above conditions are satisfied." -> "one of the above conditions is satisfied."

"coordinates in the the prediction" -> "coordinates in the prediction"

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

My concerns after the review of the first version were addressed. Now, I recommend accepting the paper.

Article Menu

Real-Time Computer Vision for Tree Stem Detection and Tracking

Further Information

Guidelines

MDPI Initiatives

Follow MDPI