Next Article in Journal
A Review of Swarm Robotics in a NutShell
Previous Article in Journal
Decentralized UAV Swarm Scheduling with Constrained Task Exploration Balance
 
 
Technical Note
Peer-Review Record

VisionICE: Air–Ground Integrated Intelligent Cognition Visual Enhancement System Based on a UAV

by Qingge Li, Xiaogang Yang *, Ruitao Lu, Jiwei Fan, Siyu Wang and Zhen Qin
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3:
Reviewer 4: Anonymous
Submission received: 29 March 2023 / Revised: 11 April 2023 / Accepted: 12 April 2023 / Published: 13 April 2023

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Chapter 4 “Experiments and Analysis” describes the effectiveness of the system. You argue “that the algorithm implemented in the present research exhibits superior real-time 386 performance during practical scene assessments, as well as a noteworthy detection accuracy in small targets detection.”

You could describe more clearly what the claim is based on, i.e. why and how the VisionICE you developed works better than the current systems. This could be presented in the Conclusion.

Chapter 3 could describe more clearly what kind of scientific research method was used to conduct the research.

Author Response

Dear the Associate Editor and Reviewers,

Thank you for your letter and for the reviewers’ comments concerning our manuscript entitled “VisionICE: Air-ground integrated intelligent cognition visual enhancement system based on UAV” (ID: drones-2343393). These comments are all valuable and helpful for revising and improving our paper as well as for guiding our future research projects. The revised portions are marked in green in the revised manuscript. Our responses to the associate editor’s and reviewers’ comments are shown below.

 

--------------------------------------------------------------------------------------------------

Responses to the associate editor’s and reviewer’s comments:

--------------------------------------------------------------------------------------------------

 

Reviewer #1:

Comment 1: Chapter 4 “Experiments and Analysis” describes the effectiveness of the system. You argue “that the algorithm implemented in the present research exhibits superior real-time 386 performance during practical scene assessments, as well as a noteworthy detection accuracy in small targets detection.” You could describe more clearly what the claim is based on, i.e. why and how the VisionICE you developed works better than the current systems. This could be presented in the Conclusion.

Response: Thank you for the helpful suggestion. Based on your comment, we have detailed the reasons why VisionICE is better than the current system.

We design an air-ground integrated intelligent cognition visual enhancement system (VisionICE) based on UAVs, visible helmets, and AR glasses in this paper. The combination of visual helmets and drones enables operators to have both ground and air perspectives, and the use of AR glasses improves the operator's situational awareness ability. By using the YOLOv7 algorithm, the accuracy of object detection can reach 97% in scenarios such as highways, villages, farmland, and forests, achieving real-time object detection of 40FPS. The VisionICE system improves the scope and efficiency of search and rescue, solves the problem of personnel being unable to search in special environments, and has the advantages of diverse fields of view, accurate recognition, rich visual experience, wide application scenarios, high intelligence, and convenient operation.

The revised details can be found in the “Conclusion” section, Line 507-516 on Page16.

 

Comment 2: Chapter 3 could describe more clearly what kind of scientific research method was used to conduct the research.

Response: Thank you for the helpful suggestion. Based on your comment, we have added Table 1 to summarize the methods and hardware selection used in the system research. The revised details can be found in the section “3.1 Hardware Framework”, which can be found in Line 256, 264-265, Page6.

 

Special thanks to you for your helpful comments.

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

This paper designs an air-ground integrated intelligent cognition visual enhancement system based on a UAV (VisionICE). The system gives the operator a dual view from the ground and the air, with a more extensive search range and higher mobility. The work is interesting, and the experiments are provided. Some suggestions are shown as follows:

(1) The main contribution of paper should be further explained; 

(2) The detail performance result of the experiments should be provided;

(3) Some references should be added, such as YOLOv7 object detection

model;

(4) Future direction should be given in  Conclusion Part. 

 

Author Response

Dear the Associate Editor and Reviewers,

Thank you for your letter and for the reviewers’ comments concerning our manuscript entitled “VisionICE: Air-ground integrated intelligent cognition visual enhancement system based on UAV” (ID: drones-2343393). These comments are all valuable and helpful for revising and improving our paper as well as for guiding our future research projects. The revised portions are marked in green in the revised manuscript. Our responses to the associate editor’s and reviewers’ comments are shown below.

 

--------------------------------------------------------------------------------------------------

Responses to the associate editor’s and reviewer’s comments:

--------------------------------------------------------------------------------------------------

 

Reviewer:

Comment 1: The main contribution of paper should be further explained.

Response: Thank you for your kind comment. Based on your comment, we have explained the main contribution of paper in Lines 80-97, Page 2.

The primary contributions of this paper can be summarized as follows.

(1) Developed an air-ground integrated intelligent cognition visual enhancement system called VisionICE. This system utilizes wireless image sensors on a drone and visible helmet to simultaneously obtain air-ground perspective images, achieving efficient patrols on a large scale in particular environments to address the issues of low efficiency and limited search range in post-disaster search and rescue operations.

(2) Based on the YOLOv7 algorithm, object detection has been achieved in scenes such as highways, villages, farmland, mountains, and forests. In practical applications, YOLOv7 can accurately identify the target class, effectively locate the target position, and achieve a detection accuracy of up to 97% for interested targets. The YOLOv7 model has a detection speed of 40FPS, which can meet the requirements of real-time target detection and provide real-time and reliable target recognition results for searchers.

(3) Utilizing portable AR intelligent glasses, real-time display of object detection results on the cloud server and onboard computer provides searchers with an immersive visual experience. Improve the situational awareness of search personnel by issuing a potential threat or anomaly alerts. Compared to traditional post-disaster search and rescue operations, VisionICE exhibits significantly strong interactivity, experiential capabilities, and versatility.

 

Comment 2: The detail performance result of the experiments should be provided.

Response: Thank you for the helpful comment. Based on your comment, we have provided the performance results of the YOLOv7 object detection algorithm in Lines 391-396, Page 11.

The experiment selected the mean average precision (mAP) and frames per second (FPS) as indicators to evaluate the performance of the YOLOv7 algorithm. The experimental results show that when IoU is set to 0.50, the trained mAP of the YOLOv7 model can reach 96.3%, with high target detection accuracy. When IoU increases from 0.5 to 0.95 in steps of 0.05, the mAP of the YOLOv7 model is 28.9%. The YOLOv7 model has a detection speed of 40FPS, which can meet the requirements of real-time object detection.

 

Comment 3: Some references should be added, such as YOLOv7 object detection model.

Response: Thank you for your kind comment. Based on your comment, we have added 19 references ([1-5], [10-17], [32-37]) and highlighted them in green in the paper.

 

Comment 4: Future direction should be given in Conclusion Part.

Response: Thank you for the helpful comment. Based on your comment, we provide future research directions in the “Conclusion” section, Line 519-527 on Page16.

However, the use of the VisionICE system in search and rescue operations also faces some challenges. The challenges in terms of drones include regulatory issues such as obtaining necessary permits and complying with airspace restrictions, as well as technical challenges such as ensuring the reliability and durability of drones and their components. In addition, accurate and reliable sensor data is also needed, as well as the development of user-friendly AR interfaces and software to effectively integrate with drone hardware and control systems. Future applications of the system include battlefield surveillance, firefighting, post-disaster search and rescue, criminal investigations, anti-terrorism and peacekeeping, intelligent life, and many others.

 

Special thanks to you for your helpful comments.

--------------------------------------------------------------------------------------------------

 

Author Response File: Author Response.docx

Reviewer 3 Report

Comments and Suggestions for Authors

The authors propose an integrated system that combines an AR device with a UAV, primarily for search and rescue operations. The content is well written, and no significant errors were discovered. I do have a few small recommendations, though.

To add more innovation to the topic of the article, better references might be added in the introduction. Are there comparable integrated systems, for instance? Is there any work that can be compared? If so, highlighting the key distinctions between your work and these would improve the quality of it.

This also applies to the section on related work. There are certain fundamental details of UAVs that are not really necessary. Perhaps the writers should emphasize the use of UAV for rescue missions or incorporate references to UAV+AR.

Avoid using very long paragraphs as in L155.

An utilized component list with model/specification would be nice in section 3.1.

Please update section 4 with more details regarding the experimental design. Why this configuration was used and how it relates to search and rescue operations. How the results can be applied to a disaster scenario, or how the reader might interpret these results.

There should be a few additional paragraphs discussing the outcomes.

 

Author Response

Dear the Associate Editor and Reviewers,

Thank you for your letter and for the reviewers’ comments concerning our manuscript entitled “VisionICE: Air-ground integrated intelligent cognition visual enhancement system based on UAV” (ID: drones-2343393). These comments are all valuable and helpful for revising and improving our paper as well as for guiding our future research projects. The revised portions are marked in green in the revised manuscript. Our responses to the associate editor’s and reviewers’ comments are shown below.

 

--------------------------------------------------------------------------------------------------

Responses to the associate editor’s and reviewer’s comments:

--------------------------------------------------------------------------------------------------

 

Reviewer:

Comment 1: To add more innovation to the topic of the article, better references might be added in the introduction. Are there comparable integrated systems, for instance? Is there any work that can be compared? If so, highlighting the key distinctions between your work and these would improve the quality of it.

Response: Thank you for your helpful suggestion. Based on your comment, we have added the references [1-14] to compare with the system in this paper. The revised can be found in the “Introduction” section, which can be found in Line 44-67, Page2.

Drone-integrated systems have emerged as a promising tool for search and rescue operations in recent years [1]. UAVs are irreplaceable in unique and complex environments due to their wide search range, good concealment performance, and high mobility [2, 3]. By being equipped with sensors such as cameras, drones can detect and locate individuals or objects in areas that may be difficult or dangerous for human rescuers to access. In recent years, computer vision technology has made break-throughs with the support of big data processing and high-performance cloud computing. The integration of computer vision technology and UAV technology has effectively addressed UAV surveillance more powerfully [4, 5] and is a powerful tool for achieving situational awareness, target indication [6], and ground target tracking [7, 8]. Therefore, combining post-disaster search and rescue and intelligent UAVs gives the searchers ground and air perspectives. It solves the problem that field surveillance cannot be carried out under particular circumstances, making the search range larger and having higher mobility.

The augmented reality (AR) technology can enhance the capabilities of drones, allowing them to perform complex tasks and provide real-time situational awareness to operators. AR technology can effectively reflect the natural world's content and overlay virtual information into the real world. AR technology involves overlaying digital information, such as images, video, and text, onto the physical environment, creating an augmented view of reality [9-12]. When combined with drones, AR technology can provide operators with a real-time view of the drone's surroundings, as well as additional information and data, such as flight paths, obstacle detection, and telemetry. This enhances the operator's situational awareness ability, enables a more intuitive experience [13], makes it easier to control and navigate drones [14], and enables drones to perform more complex tasks.

 

Comment 2: This also applies to the section on related work. There are certain fundamental details of UAVs that are not really necessary. Perhaps the writers should emphasize the use of UAV for rescue missions or incorporate references to UAV+AR.

Response: Thank you for the helpful suggestion. Based on your comment, we have added the “2.1 Drone Search and Rescue System” and “2.3 Drone-Augmented Reality Technology” section, which can be found in Page3 and Page5.

 

Comment 3: Avoid using very long paragraphs as in L155.

Response: Thank you for your kind comment. Based on your comment, we have made modifications to the relevant paragraphs, such as Line 165-190, Page4.

 

Comment 4: An utilized component list with model/specification would be nice in section 3.1.

Response: Thank you for the helpful suggestion. Based on your comment, we have added Table 1 to summarize the methods and hardware selection used in the system research. The revised details can be found in the “3.1 Hardware Framework” section, which can be found in Line 256, 264-265, Page6.

 

 

Table 1. The component list of system hardware.

Systems

Component list

Specification

S500 Quadrotor UAV

Flight Controller

Pixhawk 2.4.8

Electronic Speed Control

XXD-40A

Motor

QM3507-680KV

Remote Control

AT9S

Digital Transmission Module

3DR V5 Radio

Image Transmission Module

R2TECK-DVL1

GPS Module

GPS M8N

Sonar Obstacle Avoidance Module

RCWL-1605

Power Supply System

4S Lithium Cell

Onboard Computer

Jetson Xavier NX

PTZ Camera

FIREFLY 8s

Visible Helmet

Camera

IP Camera

AR Glasses

Epson MOVERIO BT-300

 

Comment 5: Please update section 4 with more details regarding the experimental design. Why this configuration was used and how it relates to search and rescue operations. How the results can be applied to a disaster scenario, or how the reader might interpret these results.

Response: Thank you for the helpful suggestion. Based on your comment, we have provided more details about the experimental design. The revised details can be found in the “4.1 Drone Search Flight Test” section, which can be found in Line 416-426, Line 440-447, Line 455-469, Line 471-484, Page12-15.

Firstly, we chose a road scenario to preliminarily verify drone target detection and single target tracking flight. The ability to detect vehicles, pedestrians, and other obstacles in real-time in highway scenes can effectively verify the real-time effectiveness of un-manned aerial vehicle systems. Due to the high speed of vehicles, different lighting conditions, and the presence of occlusion and complex backgrounds, highway scenes are particularly challenging for object detection algorithms. Object detection algorithms must be able to accurately recognize and track objects in these challenging environments to ensure the reliability of the VisionICE system. In addition, object detection by drones in highway scenes can collect real-time information about unexpected traffic accidents, which is of great significance for the search and rescue of traffic accidents.

Secondly, we selected the village farmland scene for single target tracking flight testing of unmanned aerial vehicles. Due to the vast farmland scene in villages and the lack of obstacles, unmanned aerial vehicle search and rescue methods are very applicable. Drones can be used to locate and track individual targets, such as lost hikers or trapped farmers, and then guide search and rescue teams to that location using re-al-time images and GPS coordinates. This can greatly reduce the time and resources required to search and rescue personnel in remote areas, and improve the safety and effectiveness of search and rescue teams.

In addition, drones can quickly and efficiently cover large areas of mountains and forests in search and rescue missions, especially in mountainous terrain where ground-based search effects may be difficult or dangerous. Drones can provide de-tailed images of terrain and surrounding areas to detect and locate specific targets in forests, such as animals or humans. This can help rescue personnel locate and rescue people lost or injured in the forest. The search and rescue results of the VisionICE system are shown in Figure 15.

Figure 15. Search and rescue results in mountainous and forest scenarios.

From the recognition results, it can be seen that the YOLOv7 object detection algo-rithm can accurately locate and recognize small targets, with a maximum detection accuracy of 97%. Moreover, this algorithm is robust to changes in target attitude, ena-bling accurate detection of targets in different directions and postures. In addition, it can also accurately detect targets under changing lighting and shrub occlusion condi-tions, with high accuracy and robustness.

The VisionICE system aims to use drones and visual helmets as the main data col-lection platform, and AR glasses as the intelligent cognitive result visualization plat-form. Visual helmets can protect the safety of search and rescue personnel and also capture ground video for transmission to cloud servers. The detection and recognition results of video data can assist the human eye in the ground target recognition, avoid-ing false positives caused by artificial subjective speculation. Drones can detect and track targets of interest within the patrol area from the air. Once the system detects a target, the ground control station will issue an alarm, allowing the drone to approach the target using a remote control or independently track the target using its target tracking algorithm. AR glasses can display real-time object detection results on cloud servers and onboard computers, and provide an augmented reality visual experience for search and rescue personnel. In addition, AR glasses prevent operators from frequently lowering their heads and raising their heads to control drones, facilitating the operation process and reducing the possibility of misoperation.

 

Comment 6: There should be a few additional paragraphs discussing the outcomes.

Response: Thank you for your kind suggestion. Based on your comment, we have added several paragraphs to discuss the experimental results. The revised details can be found in Line 431-435, Line 451-454, Line 464-469, Line 499-507, Page12-16.

Figure 12 shows that the algorithm implemented in the present research exhibits superior real-time performance during practical scene assessments, as well as a note-worthy detection accuracy in small target detection. Figure 13 shows that the algo-rithm used in this paper has high accuracy in the actual scene test and correctly classifies and locates the vehicles and pedestrians.

The algorithm used in this paper can successfully locate the critical target with high accuracy in the single-target tracking flight test. At the same time, the UAV can track the crucial target in real-time, and the flight process is very smooth and does not show significant oscillation.

From the recognition results, it can be seen that the YOLOv7 object detection algorithm can accurately locate and recognize small targets, with a maximum detection accuracy of 97%. Moreover, this algorithm is robust to changes in target attitude, enabling accurate detection of targets in different directions and postures. In addition, it can also accurately detect targets under changing lighting and shrub occlusion conditions, with high accuracy and robustness.

The VisionICE system provides a new solution for post-disaster search and rescue tasks by integrating drones, visual helmets, intelligent cognitive algorithms, and AR technology. This method can search designated areas in real-time and from multiple perspectives, providing valuable insights for search and rescue missions and other ap-plications. In addition, the system's use of AR smart glasses enhances the searcher's situational awareness by overlaying intelligent cognitive results, further improving the efficiency and effectiveness of the search process. The workflow and functionality of this system demonstrate its potential to revolutionize object detection and tracking in various fields.

 

Special thanks to you for your helpful comments.

--------------------------------------------------------------------------------------------------

 

 

Author Response File: Author Response.docx

Reviewer 4 Report

Comments and Suggestions for Authors

 

Post-disaster search and rescue is a very interesting topic in the research area. This paper proposed a system that can autonomously identify and track targets of interest, addressing the difficulty of a person needing help to conduct field inspections in particular environments. The paper is well organized and written, however, the paper could be further improved from the following aspects:

1. Since the proposed method is based on information fusion and multi-sensor fusion, there is a lot of related work on processing of the sensor data, objects’ states estimation etc. Please elaborate more on the literature review section from the configuration of the sensor: yolov5-tassel: detecting tassels in rgb uav imagery with improved yolov5 based on transfer learning, improved vehicle localization using on-board sensors and vehicle lateral velocity, autonomous vehicle kinematics and dynamics synthesis for sideslip angle estimation based on consensus kalman filter, estimation on imu yaw misalignment by fusing information of automotive onboard sensors, automated vehicle sideslip angle estimation considering signal measurement characteristic, imu-based automated vehicle body sideslip angle and attitude estimation aided by gnss using parallel adaptive kalman filters, automated driving systems data acquisition and processing platform.

2. I understand that from the experimental results in the figures, we can see that the detection performance is good. But do you have ground truth, i.e. labelled bounding box as ground to compare with your results. The ground truth is not easy to obtain, that's fine that if you don't have, but please make some explanation.

3. The limitations of this work and future work should also be discussed in the conclusion section.

Author Response

Dear the Associate Editor and Reviewers,

Thank you for your letter and for the reviewers’ comments concerning our manuscript entitled “VisionICE: Air-ground integrated intelligent cognition visual enhancement system based on UAV” (ID: drones-2343393). These comments are all valuable and helpful for revising and improving our paper as well as for guiding our future research projects. The revised portions are marked in green in the revised manuscript. Our responses to the associate editor’s and reviewers’ comments are shown below.

 

--------------------------------------------------------------------------------------------------

Responses to the associate editor’s and reviewer’s comments:

--------------------------------------------------------------------------------------------------

 

Reviewer:

Comment 1: Since the proposed method is based on information fusion and multi-sensor fusion, there is a lot of related work on processing of the sensor data, objects’ states estimation etc. Please elaborate more on the literature review section from the configuration of the sensor: yolov5-tassel: detecting tassels in rgb uav imagery with improved yolov5 based on transfer learning, improved vehicle localization using on-board sensors and vehicle lateral velocity, autonomous vehicle kinematics and dynamics synthesis for sideslip angle estimation based on consensus kalman filter, estimation on imu yaw misalignment by fusing information of automotive onboard sensors, automated vehicle sideslip angle estimation considering signal measurement characteristic, imu-based automated vehicle body sideslip angle and attitude estimation aided by gnss using parallel adaptive kalman filters, automated driving systems data acquisition and processing platform.

Response: Thank you for your helpful suggestion. Based on your comment, we have added the references [32-37] for information fusion and multi-sensor fusion in Line 207, Page5.

[32] Gao L, Xiong L, Xia X, et al. Improved vehicle localization using on-board sensors and vehicle lateral velocity[J]. IEEE Sensors Journal, 2022, 22(7): 6818-6831.

[33] Xia X, Hashemi E, Xiong L, et al. Autonomous Vehicle Kinematics and Dynamics Synthesis for Sideslip Angle Estimation Based on Consensus Kalman Filter[J]. IEEE Transactions on Control Systems Technology, 2022, 31(1): 179-192.

[34] Xia X, Xiong L, Huang Y, et al. Estimation on IMU yaw misalignment by fusing information of automotive onboard sensors[J]. Mechanical Systems and Signal Processing, 2022, 162: 107993.

[35] Liu W, Xia X, Xiong L, et al. Automated vehicle sideslip angle estimation considering signal measurement characteristic[J]. IEEE Sensors Journal, 2021, 21(19): 21675-21687.

[36] Xiong L, Xia X, Lu Y, et al. IMU-based automated vehicle body sideslip angle and attitude estimation aided by GNSS using parallel adaptive Kalman filters[J]. IEEE Transactions On Vehicular Technology, 2020, 69(10): 10668-10680.

[37] Xia X, Meng Z, Han X, et al. Automated Driving Systems Data Acquisition and Processing Platform[J]. arXiv preprint arXiv:2211.13425, 2022.

 

Comment 2: I understand that from the experimental results in the figures, we can see that the detection performance is good. But do you have ground truth, i.e. labelled bounding box as ground to compare with your results. The ground truth is not easy to obtain, that's fine that if you don't have, but please make some explanation.

Response: Thank you for the helpful comment. In the training and testing process of YOLOv7, the data annotations have ground truth, when IoU is set to 0.50, the trained mAP of the YOLOv7 model can reach 96.3%, with high target detection accuracy. When IoU increases from 0.5 to 0.95 in steps of 0.05, the mAP of the YOLOv7 model is 28.9%. The YOLOv7 model has a detection speed of 40FPS, which can meet the requirements of real-time object detection. This paper focuses on practical applications, so the experimental images were not annotated during the inference process. From the Figure 15, it can be intuitively seen that the YOLOv7 object detection algorithm can accurately locate and recognize small tar-gets, with a maximum detection accuracy of 97%. Moreover, this algorithm is robust to changes in target attitude, enabling accurate detection of targets in different directions and postures. In addition, it can also accurately detect targets under changing lighting and shrub occlusion conditions, with high accuracy and robustness.

 

Comment 3: The limitations of this work and future work should also be discussed in the conclusion section.

Response: Thank you for your helpful comment. Based on your comment, we provide the limitations of this work and future research directions in the “Conclusion” section, Line 519-527 on Page16.

However, the use of the VisionICE system in search and rescue operations also faces some challenges. The challenges in terms of drones include regulatory issues such as obtaining necessary permits and complying with airspace restrictions, as well as tech-nical challenges such as ensuring the reliability and durability of drones and their components. In addition, accurate and reliable sensor data is also needed, as well as the development of user-friendly AR interfaces and software to effectively integrate with drone hardware and control systems. Future applications of the system include battlefield surveillance, firefighting, post-disaster search and rescue, criminal investigations, anti-terrorism and peacekeeping, intelligent life, and many others.

 

Special thanks to you for your helpful comments.

--------------------------------------------------------------------------------------------------

 

Author Response File: Author Response.docx

Back to TopTop