Next Article in Journal
Is Artificial Man Still Far Away: Anthropomimetic Robots Versus Robomimetic Humans
Previous Article in Journal
Possible Life Saver: A Review on Human Fall Detection Technology
 
 
Article
Peer-Review Record

Comparison of Three Off-the-Shelf Visual Odometry Systems

by Alexandre Alapetite 1,*, Zhongyu Wang 1, John Paulin Hansen 2, Marcin Zajączkowski 2 and Mikołaj Patalan 2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Submission received: 11 May 2020 / Revised: 6 July 2020 / Accepted: 13 July 2020 / Published: 21 July 2020
(This article belongs to the Section Sensors and Control in Robotics)

Round 1

Reviewer 1 Report

This paper claims to perform a comparison of visual odometry methods, but indeed that is not so. The paper compares wheel-based odometry with the methods provided by the Intel RealSense T265, Zed Mini and ORB-SLAM2. In the literature there many more visual odometry/SLAM methods. In addition the methods used by RealSense and Zed Mini are not exactly known. It is unclear why wheel-based odometry is considered and analyzed since this is not a visual odometry method: it does not make sense to consider visual odometry. In what concerns ORB-SLAM2 the authors mention that they apply the monocular version of the method. Since both T265 and Zed use images with depth information the comparison is, from start, flawed. The monocular version has limitations, namely (as later the authors mention) the issue of scale  estimation in computing translation. Since both T265 and Zed mini provide depth it is not understandable why that data was not used as input to ORB-SLAM2.

In what concerns the experimental part itself, the experiments are quite limited in the types of trajectories used. The trajectories should have included, at least:
--Trajectories with just (pure) translations with a length of several meters, with a minimum of 5 meters.
--Trajectories with pure translations with reversal without rotation so that the robot returns to the original position;
--Trajectories with rotations: rotations about the robot axis and also rotations with several radii about axes "outside" the robot;
--Several mixed trajectories combining rotations and translations. Some of them should include a closed path, e.g., following a rectangle. The closed trajectories should also be performed both ways i.e. clockwise and counter-clockwise.

These data can then be used to compare these two sensors and/or ORB-SLAM2 (such a choice needs to be explained based on previous work or based on the literature). The paper title should reflect this: as it is, it is a comparison of two sensors and not of visual odometry methods.

The focus should probably be on the comparison of the sensors and still the comparison would not be full, since motion is planar. Since OptiTrack is available one possibility is just to attach markers to the sensors and track them in their motion. To include full 3D motion drones can be used, or, even the cameras can be moved by hand (OptiTrack provides the so-called "ground truth").

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 2 Report

Intro:

This article compares (2) commercial visual inertial camera tracking solutions (RealSense T265,zed mini) with optical wheel odometry and a monoscopic RGB  SLAM algorithm (ORB-SLAM2). Eight experiments are run comparing tracking results with ground truth established by motion capture (Optitrack). Results are tabulated and a terse discussion is given. There is little provided for discussion and conclusions which is a significant shortcoming since there is no novelty in the sensors or algorithms, i.e., the sensors come with tracking off the shelf and ORB-SLAM2 can be downloaded and run. Then the bulk of contribution is in a good experimental plan and a detailed result analysis. In this category this reader feels the article falls short and needs to be significantly advanced before being worthy of publication. 

Pros:

Provides some hard data on the reliability and accuracy of odometry/visual odometry sensors/algorithms under specific conditions.

Ran some experiments and tracked accuracy with motion capture and quantified (2) commercial solutions for visual-inertial camera tracking vs optical? wheel encoded odometry and a monoscopic visual SLAM algorithm.  

Cons:

Title: You evaluate wheel odometry, commercially produced visual-inertial odometry (RealSense T265,zed) and purely "visual odometry" ORB-SLAM2 on Zed camera. I feel the title should better describe the article contents. Perhaps elimination of ORB-SLAM2 and specifically addressing commercial sensor-borne visual-inertial tracking. ORB-SLAM2 has a backend optimizer that complicates comparison. Further it uses significantly less sensor input data which, in a way, makes the comparison unfair.

Subject: You mention SLAM very much but you evaluate only tracking. Be sure to delineate between SLAM and tracking. The sensors are not doing SLAM but ORB-SLAM2 is doing SLAM and building and optimizing a map. This is a major difference and is not captured. It also makes a big difference in terms of tracking over long periods of time.

Specifically, if you run these systems for a long trajectory with a loop closure it would be very interesting to see the drift of integrated visual-inertial tracking vs. the accuracy of ORB-SLAM2 with a loop closure. This is completely omitted from discussion. It is also important to discuss.

Experimental work:

The data appears to include highly structured laboratory scenes. I believe this can adversely bias the results. Authors should characterize the environmental conditions of the data collects.

I do not believe ORB-SLAM2 integrates imu intrinsically albeit fusion can happen via a downstream filter. This explains outliers and variance in Figure 4 with fast rotation. Availability of an IMU (especially for visually-challenging rotation) makes for a very biased comparison between ORB SLAM2 and (realsense/zed) to the point that it is misleading. 

I mention some other shortcomings of the experimental setup below. The abstract, conclusion and article contents should qualify results to the specific context of this experimental setup omission of this can mislead readers.

Results:

I find figure 4,5 difficult to read. I would organize by experiment in the x-axis. 

For the tables this reader would prefer decimals, e.g., 0.4682 rad, to commas, e.g., 0,4682 rad. I'm not sure if there is a journal formatting guideline but it does effect readability for this reader.

Figure (3) ORB-SLAM2's consistent undershooting may be explained by the scale factor optimization. No other system had to overcome these unknowns: (1) scale and (2) lack of any IMU data. Note both (1,2) are very significant differences.

Statistical analysis and discussion are extremely short and lacking. Explain the contents of the result Figures more. When & why are outliers occurring in each situation? Can they be easily rejected by standard Chi-squared testing for use in real filters or are they clustered temporally as to make this problematic?

The discussion and conclusion are significantly lacking and is the largest weak-point of this article. If someone is to read your article and experiment you have an obligation to describe the take-aways in terms insights via discussion and final statements in your conclusions that the results suggest. The discussion and conclusions here I feel must be restricted to short trajectory ground vehicle motion.

From the article: "It is important to note that in order to ensure that we are testing the different systems in a fair way, all visual odometry systems were running in parallel, meaning that there were exposed to exactly the same environment."

Update rates will be significantly different in this circumstance and update rates can significantly effect tracking accuracy. How do you ensure fairness in computation here? It seems the RealSense (with onboard processing) has unfair advantage as the onboard (Pi or Jetson) computer can easily serve this thread. This is unlikely to be the same for ORB SLAM2 in comparison. Perhaps it is more "fair" to run them offline on the same recorded telemetry individually? Justify this statement more by more detail as this reader is having trouble agreeing with this statement.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Edits address much of my initial concerns and it is my feeling the resulting work is an improved product.

Back to TopTop