Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

VSLAM Optimization Method in Dynamic Scenes Based on YOLO-Fastest

Electronics 2023, 12(17), 3538; https://doi.org/10.3390/electronics12173538

by Zijing Song¹, Weihua Su^2,*, Haiyong Chen¹, Mianshi Feng¹, Jiahe Peng¹ and Aifang Zhang³

Reviewer 1:

Mário Véstias

Reviewer 2:

Mingren Shen

Reviewer 3: Anonymous

Electronics 2023, 12(17), 3538; https://doi.org/10.3390/electronics12173538

Submission received: 15 July 2023 / Revised: 5 August 2023 / Accepted: 10 August 2023 / Published: 22 August 2023

(This article belongs to the Special Issue Selected Papers from the 7th Asian Conference on Artificial Intelligence Technology (ACAIT 2023))

Round 1

Reviewer 1 Report

The authors propose an optimized VSLAM using YOLO-Fastest.

The idea is interesting, but a few points could be better explored.

What is the influence of the quality of YOLO in the whole method? What would be the improvement if, for example, YOLOv4-Tiny was used?

Table 6 reports the execution times for what platform? Kirin 990 CPU? This is an expensive embedded GPU. Have you tested with, for example, Jetson Nano? Could you achieve real-time performance on this processor?

What is the real-time requirement? How much power is the CPU consuming?

Please provide the complexity of each model (FLOPS, parameters)

No comments.

Author Response

Thank you for your review comments. Attached is our sincere point-by-point response to your review comments. We have carefully revised our paper based on your suggestion.

Author Response File: Author Response.pdf

Reviewer 2 Report

This paper “VSLAM Optimization Method in Dynamic Scenes Based on 2 YOLO-Fastest” proposes an optimized visual simultaneous localization and mapping (VSLAM) method called YF-SLAM for dynamic environments. To address the accuracy degradation introduced by dynamic objects like moving people, YF-SLAM integrates a lightweight YOLO object detection network called YOLO-Fastest to quickly identify dynamic objects to filter out the points inside the detected regions. Experiments show YF-SLAM reduces trajectory error and outperforms other methods in real-time on a low-power Jetson Nano robotics platform.

However, I have the following concerns,

(1) The main disadvantage of traditional SLAM is that, as stated in the paper, “these methods may fail when there are too many dynamic objects occupying a significant portion of the image.” However, the authors do not provide direct evidence that their proposed YF-SLAM approach can robustly handle environments with many dynamic objects. Specific experiments or analyses should be added to demonstrate the system's capabilities when dynamic elements dominate the scene. For example, the authors could test performance on sequences where people walk through most of the camera's field of view, or compare mapping accuracy as the number of moving objects increases. Without these validations, the purported advantages over traditional methods remain unsubstantiated when frequent occlusions or large moving objects are present. Explicitly showing success in crowded dynamic settings would strengthen the paper's conclusions.

(2) The Yolo-FastestV2 used in this study has been adopted from open sources. However, the original creators and the source from where the code was obtained are not properly credited. Appropriate citations should be included for ethical considerations and to acknowledge the original work.

(3) Based on Figure 5, both DYNA-SLAM and YF-SLAM exhibit larger errors at the beginning of the trajectory tracking for roll and pitch. One potential reason for this initial degradation is that the object detection and feature filtering threads require a few frames to initialize and converge on the dynamic elements in the scene. At the start, the detection bounding boxes may be inaccurate, letting through more unstable features and causing transient errors in pose estimation. As the system processes more frames, the detection and filtering improve, leading to more accurate tracking. This reveals an intrinsic limitation of detection-based VSLAM methods, which require a warm-up time for the semantic processing to stabilize. The authors could aim to address this issue in future work, for example by propagating semantic predictions temporally to improve initialization. Analyzing the factors influencing this trajectory drift at the outset could lead to insights for reducing it.

There are minor errors present in the paper that require attention.

(1) In line 82, “where n the real point X in the 3D world is called the object point,”, I think the “n” is a typo.

(2) In line 89, equation (2), the term "t^{TT}" seems to be a typographical error. It should likely be "t^{T}" if it is intended to represent the transpose of a matrix 't'.

(3) In line 369, it would be more appropriate to refer to the graphics processing unit as " GeForce GTX 1660 Ti" to adhere to NVIDIA's naming conventions.

(4) The resolution of Figures 2, 3, and 5 appears to be low, resulting in blurred text. I suggest enhancing the image quality to ensure clear legibility.

(5) Tables are inconsistently referred to in the text as "I, II," while they're named "1, 2." Please maintain a consistent format for table references.

(6) The paper alternates between "YOLO" and "Yolo," which may lead to confusion. For example, in line 199, it is “Yolo-Fastest” and then in line 62 it is “YOLO-Fastest”.As YOLO stands for "You Only Look Once," it's more fitting to consistently use the capitalized format throughout the text. Please revise accordingly.

Please conduct thorough proofreading to ensure clarity and accuracy in your manuscript.

Author Response

Thank you for your review comments. Attached is my sincere point-by-point response to your review comments.

Author Response File: Author Response.pdf

Reviewer 3 Report

The comments are attached in PDF.

Comments for author File: Comments.pdf

Minor editing

Author Response

Thank you for your review comments. Attached is our sincere point-by-point response to your review comments.We have carefully revised our paper based on your suggestion.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

No comments

Article Menu

VSLAM Optimization Method in Dynamic Scenes Based on YOLO-Fastest

Further Information

Guidelines

MDPI Initiatives

Follow MDPI