C2VIR-SLAM: Centralized Collaborative Visual-Inertial-Range Simultaneous Localization and Mapping

Xie, Jia; He, Xiaofeng; Mao, Jun; Zhang, Lilian; Hu, Xiaoping

doi:10.3390/drones6110312

Open AccessArticle

C2VIR-SLAM: Centralized Collaborative Visual-Inertial-Range Simultaneous Localization and Mapping

by

Jia Xie

^†,

Xiaofeng He

^†

,

Jun Mao

^*,

Lilian Zhang

and

Xiaoping Hu

College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Drones 2022, 6(11), 312; https://doi.org/10.3390/drones6110312

Submission received: 18 September 2022 / Revised: 13 October 2022 / Accepted: 19 October 2022 / Published: 23 October 2022

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Collaborative simultaneous localization and mapping have a great impact on various applications such as search-and-rescue and agriculture. For each agent, the key to performing collaboration is to measure the motion relative to other participants or external anchors; currently, this is mainly accompanied by (1) matching to the shared maps from other agents or (2) measuring the range to anchors with UWB devices. While requiring multiple agents to visit the same area can decrease the task efficiency and anchors demand a distribution process, this paper proposes to use a monocular camera, an inertial measurement unit (IMU), and a UWB device as the onboard sensors on each agent to build an accurate and efficient centralized collaborative SLAM system. For each participant, visual-inertial odometry is adopted to estimate the motion parameters and build a local map of the explored areas. The agent-to-agent range is measured by the onboard UWB and is published to the central server together with the estimated motion parameters and the reconstructed maps. We designed a global optimization algorithm to make use of the cross-agent map match information detected by a visual place technique, and the agent-to-agent range information to optimize the motion parameter of all the participants and merge the local maps into a global map. Compared with existing collaborative SLAM systems, the proposed system can perform collaboration with onboard UWB measurements only, vision only, and a combination of these; this greatly improves the adaptiveness and robustness of the collaborative system. We also present an in-depth analysis of C2VIR-SLAM in multiple UAV real-flight datasets.

Keywords:

collaborative SLAM; visual-inertia SLAM; multi-robot system

1. Introduction

Accurate, robust, and efficient visual SLAM is of great importance to various robotic applications. With extensive research for decades, VSLAM is reaching significant maturity and precision in single-agent applications. The current demand for multi-robot systems in search-and-rescue, agriculture, and other applications has attracted attention in collaborative SLAM. For each agent, the performance of the SLAM system can be improved by using the shared information from other agents. Furthermore, the local maps constructed by each participant can be merged into a global map, which greatly improves efficiency.

However, for robots working as a team, each agent not only needs to estimate its own motion states but also has to measure motion relative to its collaborators. Most conventional collaborative robotic systems rely on external infrastructure to collaborate with other agents, such as using motion capture systems [1] or global navigation satellite systems (GNSS) [2]. With the development of the UWB sensors, some researchers have proposed to make use of UWB anchors as the reference for the multi-agent system. While early UWB-based localization systems have to calibrate the position of the anchors [3], recent work has shown that robots with onboard SLAM capability can also use unknown static UWB anchors to augment the localization performance and to estimate the spatial relations to integrate the local maps into a global map [4,5]. Though the unknown static UWB anchors remove the requirement of the position calibration process, they still need to be distributed; in large-scale applications, anchors in different regions are also required due to the limited effective range of the UWB devices. UWB signals are also subject to interference. These factors limited the application of UWB-aided collaborative SLAM systems.

Other than relying on external devices, some researchers have proposed to share the visual maps among the participants and perform collaborative SLAM by matching the observed features to the shared maps [6,7,8]. Compared to UWB-based systems, vision-based collaborative SLAM requires less infrastructure construction and is free from the blocks to the ranging measurements. However, the agents need to have common view areas and have condition invariant place recognition capability to perform collaboration, which limits the efficiency of task execution. The shared visual map also brings significant communication burdens to multi-agent systems.

While UWB-based and vision-based collaborative SLAM have their pros and cons, researchers have proposed several systems that use both UWB and vision to augment each other [4,5]. However, these systems mainly focus on using the onboard vision system to estimate the motion and the position of the uncalibrated static UWB anchors and rely heavily on the anchors to perform collaboration.

In this paper, we propose a novel centralized collaborative visual/inertia/range SLAM system (C2VIR-SLAM) that only relies on onboard visual, inertial, and UWB ranging devices to achieve accuracy and efficient centralized collaborative SLAM. For each agent, a visual–inertial odometer (VIO) is adopted to estimate the motion parameters and reconstruct the local maps. Meanwhile, the VIO estimated parameters, local maps, and the agent-to-agent UWB range information are published to a central server. The server then optimizes the motion parameters and merges the local maps into a global map. Compared with existing systems, the main contribution and characteristics of C2VIR-SLAM are as follows:

We propose to use onboard UWB devices, other than calibrated or uncalibrated static UWB anchors, in the proposed C2VIR-SLAM system, which removes the requirements of the prior anchor distribution process and enlarges the UWB effective ranges as the device moves with the agent.
We design a system capable of using vision or onboard UWB, solely or in combination, to perform collaborative localization and mapping.
We conduct systematic experiments in different datasets with different system setups and comprehensively analyze the performance improvements brought in by using vision only, UWB only, and their combination in collaboration.

Figure 1 shows an illustration of the C2VIR-SLAM with collaborative agents, where both cross-agent vision constraints and the agent-to-agent range constraints are used to augment the localization and mapping performance of the system.

2. Related Work

2.1. Single Robot SLAM

Visual SLAM is of great importance to mobile robots and has gained large attention [9]. While classic VSLAM systems rely only on vision sensors, recent work proposed to use IMU to aid the vision and has shown significant improvements both in accuracy and robustness [10,11]. Loop closure in the SLAM system is crucial to reduce the localization drifts and correct the reconstructed maps. However, loop closure requires the robot to revisit an explored area, which constrains the task execution efficiency. In order to solve the data association problem in the SLAM system, a novel posterior-based approximate joint compatibility test method [12] has been proposed to achieve lower computational complexity, lower sensitivity to linearization errors, and higher precision than classical algorithms including SCNN, RANSAC, and JCBB.

To reduce the drift of the VSLAM system and achieve high efficiency, researchers have proposed to use other onboard sensors or external devices to aid VSLAM. For example, magnetic compasses can be integrated with IMU and cameras to improve both orientation and localization accuracy [13]. GNSS has also been integrated with visual–inertial systems to achieve high-precision performance [14]. However, GNSS signals are subject to multipath effects and are unavailable in indoor environments. Instead of using GNSS signals, which contain ranging information between the receiver and the satellites, researchers have proposed to use UWB to measure ranging information relative to the anchors to aid the SLAM systems. Based on the different usage of UWB, it can be broadly divided into two paradigms: known anchor aided [15] and unknown anchor aided VSLAM [4,5]. The known anchors aid systems that require an offline process to calibrate the anchors’ position, which is unsuitable for fast deployment. The unknown anchors aided SLAM systems to remove the calibration process and are more efficient. Recent work has shown that even one unknown anchor can significantly improve the VSLAM performance [4]. Although the unknown anchors do not need to calibrate their positions offline, they still have to be distributed in the environment, and each anchor has a limited effective range, which is not suitable for unexplored and unstructured environments.

2.2. Multi-Robot Collaborative SLAM

Though VSLAM systems have gained impressive progress in the past years, their accuracy and efficiency are still expected to be improved; however, this is quite a challenge for single-robot systems. To overcome the limited capability of a single robot, the idea of organizing multiple robots to collaborate with each other has attracted attention in recent years [16]. For each participant in the collaborative visual SLAM system, they can use the shared information from their collaborators to improve their localization accuracy; their constructed local maps can also be merged into a global map, which significantly improves efficiency.

Collaborative visual SLAM systems can be broadly categorized as centralized and decentralized architecture [17]. Currently, most collaborative VSLAM systems deploy a centralized architecture, as the decentralized ones face the challenges of limited onboard computational resources, and data consistency and synchronization problems. In contrast to this approach, the centralized architecture is easier for data management and can leverage the heavy computation to a powerful server.

The key issue of multi-robot collaborative VSLAM is to estimate the relative pose between the participants; this is mainly accompanied by using external infrastructure as a reference or using onboard sensors only. For example, the GNSS [2,18] and calibrated UWB anchors [3] can provide accurate motion reference and have been used as bridges to measure the relative motion between the robots; however, such reference systems are not available in indoor or unexplored environments. Recent works have also shown that unknown UWB anchors [4] can also be used to construct a collaborative SLAM system. However, the unknown static anchors also have to be distributed before deployment, and each anchor has a limited effective range.

Instead of relying on external infrastructures or anchors, using onboard sensors to estimate the relative motion improves the environmental adaptability of the collaborative SLAM system. Currently, a widely used approach in collaborative VSLAM is to share the local maps among the participants. Then, visual place recognition techniques can be used to match the observed visual features with the shared maps from the collaborators and calculate the relation from the matches [6]; this approach also helps to delete the redundant map features and build a consistent global visual map [7,19]. However, using vision-based collaborative SLAM requires the collaborators repeatedly travel to common places. Moreover, the visual condition variations can fail the visual place recognition component and cause the failure of collaboration.

In this paper, we propose to use cameras, MIMU, and UWB to construct a novel centralized collaborative SLAM system. Unlike existing systems that rely on static UWB anchors, we propose to use onboard UWB devices to measure agent-to-agent range information to aid collaboration. Meanwhile, we also incorporate vision to match the features of different agents to support collaboration. The onboard UWB and camera can complement each other in the collaborative system. When the participants have no path intersections, they can use the agent-to-agent ranging obtained from the onboard UWB to perform collaboration, while when UWB signals are too noisy to use or blocked, the participants can still use vision to collaborate with each other and optimize the global map.

3. Methods

3.1. Overview of the System

The proposed C2VIR-SLAM includes four main components: (i) single-agent visual-inertial odometry, which estimates the motion of the agent and reconstructs the map of the explored local area, (ii) agent-to-agent range measuring with the onboard UWB device, (iii) place recognition, which detects loops in the maps shared by all agents, and estimates the relative motion between participants, and (iv) collaborative localization and mapping, which optimizes the motion parameters of all the agents and constructs a global map. The system architecture of the C2VIR-SLAM is shown in Figure 2.

3.2. Single-Agent Visual–Inertial Odometer

We adopt VINS-Mono [11] to estimate the motion of each agent and reconstruct the 3D structure of the observed visual features. For each agent in the system, the estimated states of the onboard VINS are expressed as

\begin{array}{l} χ & = [x_{0}, x_{1}, \dots, x_{n}, λ_{0}, λ_{1}, \dots, λ_{m}] \\ x_{k} & = [p_{k}, v_{k}, q_{k}, b_{a}, b_{g}], k \in [0, n] \end{array}

(1)

where

λ_{m}

is the inverse depth of the

m^{t h}

visual features.

x_{k}

is the IMU state that corresponds to the

k^{t h}

image in the sliding window. It includes the position

p_{k}

and the posture quaternion

q_{k}

, as well as the bias of accelerometers and gyroscopes.

n

is the length of the sliding window. The objective function of the odometry is expressed as

\min_{χ} {‖ r_{p} - H_{p} χ ‖^{2} + \sum_{k \in B} ‖ r_{B} ({\hat{z}}_{b_{k + 1}}^{b_{k}}, χ) ‖_{P_{b_{k + 1}^{k}}}^{2} + \sum_{(l, j) \in C} ρ (‖ r_{C} ({\hat{z}}_{l}^{c_{j}}, χ) ‖_{P_{l}^{c_{j}}}^{2})}

(2)

where

r_{B} ({\hat{z}}_{b_{k + 1}}^{b_{k}}, χ)

and

r_{C} ({\hat{z}}_{l}^{c_{j}}, χ)

are the residual terms of IMU measurement and the visual measurement, respectively.

B

is the set of all IMU measurements and

C

is the set of features that have been observed at least twice in the current sliding window.

‖ \cdot ‖_{P}^{2}

means Mahalanobis norm with respect to

P

and

ρ (\cdot)

is the Huber norm, and

{r_{p}, H_{p}}

is the prior information from marginalization. For more technical details, please refer to VINS-Mono [11]. After optimization, the keyframe information, the onboard UWB measurement, the reconstructed 3D map points and the estimated motion states of the keyframes

p_{k}

and

q_{k}

are published to the central server for collaboration.

The used visual–inertial odometry has shown impressive performance in accuracy, robustness, and efficiency. However, other state-of-the-art keyframe-based VIO can also be used as the front-end single-agent odometry in the proposed collaborative system.

3.3. Collaborative Localization with Pose Graph

Collaborating with other agents has been demonstrated as an effective way to improve localization accuracy. While existing work mostly uses either static UWB anchors or vision to build relations with other agents, the proposed C2VIR-SLAM uses both onboard UWB devices and vision to perform collaboration, which improves efficiency and adaptability.

The collaborative localization is performed in the central server and is modeled as a pose graph optimization problem, where the states are expressed as

χ = [X^{1}, X^{2}, \dots, X^{n}] X^{i} = [x_{1}^{i}, x_{2}^{i}, \dots, x_{t}^{i}] t \in Ω [0, s] x_{t}^{i} = [p_{t}^{i}, q_{t}^{i}]

(3)

where

X^{i}

represents the motion states of all the keyframes shared by the

i^{t h}

agent, which consists of the corresponding position vector

p_{t}^{i}

and the quaternion

q_{t}^{i}

.

t

is the number of keyframes and its value is a positive integer between

[0, s]

.

Then, the collaborative pose graph optimization can be expressed as

\min_{χ} {\underset{Sequence Residual}{\underset{︸}{\sum_{i = 1}^{n} \sum_{k \in S_{k}} ‖ r_{k, k + 1}^{i} ‖^{2}}} + \underset{Map Matching Residual}{\underset{︸}{\sum_{(i, j, k, l) \in M} ρ (‖ r_{k, l}^{i, j} ‖^{2})}} + \underset{UWB Ranging Residual}{\underset{︸}{\sum_{t \in Ω} ρ (‖ r_{u} ({\hat{d}}_{t}, χ) ‖_{P_{U W B}}^{2})}}}

(4)

The above problem contains three residuals to be optimized. The sequence residual refers to the relative motion constraints of the VIO within each agent, which will be described in Section 3.4. The map-matching residual describes the re-localization constraints from the visual place recognition component, which will be discussed in Section 3.5. The UWB ranging residual, which describes the agent-to-agent ranging constraints, will be given in Section 3.6.

3.4. Sequence Constraints from the Odometry

The sequence residuals in (4) represent the relative transformation between two consecutive keyframes of the same agent. We followed the definition in [11] and expressed the residual as

r_{k, k + 1}^{i} = [\begin{array}{l} {(R_{k}^{i})}^{- 1} (p_{k}^{i} - p_{k + 1}^{i}) - {\hat{p}}_{k, k + 1}^{i} \\ {(q_{k, k + 1}^{i} ⨂ {({\hat{q}}_{k, k + 1}^{i})}^{- 1})}_{x y z} \end{array}]

(5)

where

{\hat{p}}_{k, k + 1}^{i}

and

{\hat{q}}_{k, k + 1}^{i}

can be calculated from the VIO estimation as

\begin{array}{l} {\hat{p}}_{k, k + 1}^{i} = {({\hat{R}}_{k}^{i})}^{- 1} ({\hat{p}}_{k + 1}^{i} - {\hat{p}}_{k}^{i}) \\ {\hat{q}}_{k, k + 1}^{i} = {\hat{q}}_{k}^{i} \otimes {({\hat{q}}_{k + 1}^{i})}^{- 1} \end{array}

(6)

Following the above formulation,

q_{k, k + 1}^{i}

can be obtained from

q_{k}^{i}

and

q_{k + 1}^{i}

, which are the parameters to be optimized in (3).

3.5. Map Matching Constraints from Visual Place Recognition

Supposing all agents can communicate with the server at any time in a limited area, the server receives the keyframe messages from all the agents and uses a place recognition component to detect the visual overlap of the commonly visited places. Note that the proposed systems do not distinguish the difference between matching to the self-constructed map or the shared maps from other agents. Therefore, the received keyframes are matched to the maps shared by all the agents.

In detail, for a new query keyframe, we compare it with all the existing keyframes by using the DBOW2 appearance similarity [20]. Once the similarity exceeds a certain threshold, an attempt is made to detect a loop between the two similar frames. The 3D feature point on the query keyframe is matched to the 2D feature on the candidate keyframe, where the features are described as BRIEF descriptors [21]. Then, the RANSAC algorithm is used to evaluate 3D-2D connections by solving Perspective-N-Points (PNP) problems [22]. Once the number of inliers exceeds a certain threshold, it is considered as a valid map match constraint. The corresponding relative pose

[{\hat{q}}_{k_{i}}^{l_{j}} ∣ {\hat{p}}_{k_{i}}^{l_{j}}]

between the two matched keyframes can be obtained by solving the PNP problem.

{\hat{q}}_{k_{i}}^{l_{j}}

is the relative quaternion between the

k^{t h}

keyframe of the

i^{t h}

agent and the

l^{t h}

keyframe of the

j^{t h}

agent;

{\hat{p}}_{k_{i}}^{l_{j}}

is the corresponding relative translation. Similar to (5), the map matching residual can be expressed as

r_{k, l}^{i, j} = [\begin{array}{l} {(R_{l}^{j})}^{- 1} (p_{k}^{i} - p_{l}^{j}) - p_{k_{i}}^{l_{j}} \\ {(q_{k_{i}}^{l_{j}} ⨂ {({\hat{q}}_{k_{i}}^{l_{j}})}^{- 1})}_{x y z} \end{array}]

(7)

where

⨂

represents the quaternion multiplication and

{(\cdot)}_{x y z}

extracts the vector part of a quaternion.

By comparing (5) and (7), we can note that while the sequential constraints use the VIO states within an agent, the map-matching residuals can use the motion information from different agents. Some existing research has revealed introducing the map matching residual into a collaborative SLAM system can greatly improve the accuracy [6,7]; however, this requires the collaborative participants to have commonly viewed places and recognize the overlap views, which decreases the efficiency and brings in challenges to the visual recognition component.

3.6. Relative Ranging from Onboard UWB

While the map-matching residual requires commonly visited places, UWB can directly measure the range information between two devices. Some research has used position-calibrated UWB anchors or uncalibrated static anchors [4,5] in collaborative SLAM systems. However, here we propose to rigidly mount the UWB device on the collaborative agents, which removes the anchor distribution process and is more suitable for applications in unexplored environments.

We use the two-way ranging (TWR) model of UWB, as it can directly measure the distance between the two transceivers without the aid of a synchronization device. The range measurement is modeled as [4]

\hat{d} = d + e

(8)

where

\hat{d}

is the UWB measurement value,

d

is the real ranging value, and

e

is the error following Gaussian distribution.

In our system, the UWB devices are mounted onboard with the agent. Therefore, the obtained ranging is the distance between the two corresponding agents; this ranging information can be used as the observation to correct the motion parameter measured by the odometry by defining a residual term as

r_{U} ({\hat{d}}_{t}, χ) = γ_{r} \cdot (‖ p_{t}^{i} - p_{t}^{j} ‖ - {\hat{d}}_{t})

(9)

where

p_{t}^{i}

and

p_{t}^{j}

is the position to be optimized by the pose graph;

{\hat{d}}_{t}

is the UWB measurement between the

i^{t h}

and the

j^{t h}

agents and it is synchronized with the keyframes according to the timestamps.

γ_{r}

is the weight for this residual.

Note that pose graph optimization can be time-consuming and only effective when map-matching residual or UWB-ranging residual terms are added to (4). To improve the efficiency, we check the existence of valid map matching and UWB measurements with a constant time interval, and only perform optimization when there exists map match residual or UWB ranging residual within that time interval.

3.7. Map Refinement with Global Bundle Adjustment

The pose graph only optimizes the attitude and position of the collaborative agents, but one main advantage of using collaborative SLAM is to construct a global map more efficiently than using a single agent. To achieve this, the proposed C2VIR-SLAM system adopts a global bundle adjustment (GBA) process to construct a global map and optimizes the motion parameters again.

In detail, we compute the re-projection of all features in all the keyframes shared by the collaborative agents, and optimize the states

χ

and the 3D position of the features to minimize the overall re-projection error. The objective function of GBA can be expressed as:

\min_{x_{k}^{i} \in χ, P_{m}^{} \in P} \sum_{m = 1}^{M} \sum_{i = 1}^{N} \sum_{k = 1}^{K} ‖ {\hat{z}}_{k, m}^{i} - π (x_{k}^{i}, P_{m}) ‖

(10)

where

{\hat{z}}_{k, m}^{i}

is the coordinate of the

m^{t h}

map point in the

k^{t h}

keyframe of the

i^{t h}

agent.

π ()

is the projection function that gets the homogeneous coordinates of the 3D points.

x_{k}^{i}

consists of the keyframe pose and quaternion as shown in (3), and

P_{m}

is the 3D position of the

m^{t h}

map point. The re-projection factor iterates through all the keyframes and all the map points. To improve efficiency, the GBA is only performed after a valid pose graph optimization.

4. Experiments

We validate our algorithms with two real-flight UAV datasets. The localization performance was evaluated by the root mean square error (RMSE) and scale error, which were obtained by aligning the results with the ground truth by using the Umeyama [23] algorithm as the widely used ATE [24] evaluation method does.

4.1. Datasets

We used the publicly available EuRoC dataset [25] and a self-constructed multi-UAV dataset, named as the Testing Zone dataset to evaluate the proposed C2VIR-SLAM system.

The EuRoC dataset contains sequences of visual–inertial data captured repeatedly using a single UAV with a forward-looking camera in an indoor environment. As the EuRoC dataset has no UWB measurement, we simulated the UWB data with the same sample frequency as the camera by using the ground truth data and added zero-mean Gaussian noise with 0.03 meters’ standard deviation to simulate the sensor measurement noise. As the sequences of the datasets were captured independently, we ran different sequences simultaneously on different ROS nodes to simulate the collaboration.

The Testing Zone dataset contains two experiments, where each experiment includes a two-agent flight. The data were captured at the National Intelligent Connected Vehicle (Changsha, China) Testing Zone by using two DJI Matrice 600 Pro drones flying simultaneously at an outdoor area of 300 m × 200 m. The drones were equipped with a downward-looking camera, a MIMU, a UWB device as the onboard sensors, and a GNSS receiver as the reference. The detailed sensor setup is shown in Table 1, and an illustration of the dataset is shown in Figure 3. Some key characteristics of the two datasets are listed in Table 2.

4.2. Experimental Results

All the algorithms ran on an Ubuntu computer with an Intel i9-9900K CPU @ 3.60 GHz and 32 GB RAM. The communication was simulated by using the ROS node communication mechanism. All the results were the average of three runs of the same parameter settings.

4.2.1. EuRoC Datasets

We tested the proposed system with two and three agents setups with different sequence combinations and compared the results with CVI-SLAM [6], which is a state-of-the-art centralized collaborative SLAM system using visual and inertial sensors.

The localization RMSE and the percentage of the trajectory positioning error relative to the length of the trajectory are shown in Table 3. CVI-SLAM is not an open-source project and only has shown three groups of results in the public literature. We quote the results of CVI-SLAM in Table 3 for comparisons. To make a fair comparison, we have tested the proposed C2VIR-SLAM with vision-based collaboration only; that is, we only included the sequential residual and the map-matching residual in (4) in the optimization. Though both CVI-SLAM and C2VIR-SLAM use vision and IMU as the onboard sensors, the proposed C2VIR-SLAM shows better performance in all three groups of two-agent collaboration experiments. The full C2VIR-SLAM, which uses both vision and UWB in collaboration, shows the best performance, which demonstrates the advantage of the proposed system. Figure 4 shows an illustration of the full C2VIR-SLAM, the UWB range constraints, and the visual map-matching constraints are shown in blue and red separately; these constraints use the relations between agents to optimize the localization results.

The last two rows in Table 3 show the results of the three agents’ collaboration experiments. By adding the MH3 sequence into MH1 and MH2 and MH4 and MH5, the two agents’ experiments become three agents experiments, and the three agents’ collaboration results significantly outperform the two agents’ experiments. This is mainly because the new sequence brings in more relative motion observations for collaborative optimization.

4.2.2. Testing Zone Dataset

The experimental results of the Testing Zone dataset are shown in Table 4, which includes the performance of VINS-Mono [11], C2VIR-SLAM with UWB-based collaboration only, C2VIR-SLAM with vision-based collaboration only and full C2VIR-SLAM.

As shown by the table, both C2VIR with UWB only and C2VIR with co-visual only outperforms the single-agent VINS-Mono, which demonstrated the advantage of using collaborative systems.

The last column of Table 4 shows the full C2VIR-SLAM achieves the best performance, with the optimized trajectories and reconstructed maps shown in Figure 5. The proposed collaborative system consistently outperformed the VINS-Mono system in all agents by improving the location accuracy by more than 40%. This once again demonstrates the advantage of using both UWB ranging and vision in collaboration.

Figure 6 shows the distributions of the UWB constraints and the visual map-matching constraints. The UWB ranging contribution to the collaborative system is shown by the blue dotted lines. Note that the onboard UWB measurement is not always available. We analyze this mainly because the UWB signal is disturbed by the electromagnetic effect of the UAV motors. The vision constraints contribution to the collaborative system includes the agent internal loop closure constraints, as well as cross-agent map matching constraints. Note that the positioning accuracy of C2VIR-SLAM with UWB only is better than it with Co-visual only in the Testing Zone dataset; this is mainly due to the higher cooperative frequency of UWB (blue line), as shown in Figure 6.

5. Conclusions

In this paper, we present C2VIR-SLAM, which uses both visual and agent-to-agent range information for centralized collaborative localization and mapping. Each participant in the system is equipped with a camera, MIMU, and a UWB sensor. The camera and the IMU are firstly integrated to form a VIO to ensure single-agent autonomy. Meanwhile, the VIO estimated motion states, reconstructed 3D map points, and the UWB measurements are published to the central server. The server detects visual loop closures from all the submaps shared by the participants. Then, the server combines the relative motion computed from loop closure and the agent-to-agent range measurement from UWB to optimize the motion states of all the participants and perform bundle adjustment to construct a global map.

We have validated the proposed C2VIR-SLAM with different settings on different real-flight datasets. The results have shown that C2VIR-SLAM can use vision only, UWB only, and their combination to perform collaborative localization and mapping. Moreover, the combination of UWB and vision shows better collaborative performance than using a single component.

Compared with existing collaborative SLAM systems, C2VIR uses onboard UWB to measure agent-to-agent range information for collaboration, other than using UWB anchors. Moreover, the vision and UWB range can complement each other to maintain collaboration; when one component fails, the other can still be used to maintain collaboration.

However, like other vision-based centralized collaborative SLAM systems, C2VIR-SLAM requires a large communication resource for visual map sharing. Although bandwidth consumption can be saved through information compression, such as transmitting feature points rather than the whole image, once extended to large-scale swarm collaboration, all data that is transmitted to the central server will still cause huge communication bandwidth pressure. In addition, the server needs a lot of computing resources and time to process all data from agents. The centralized system framework maybe not be suitable for a large-scale swarm. Future work can focus on applying decentralized collaborative architectures to large-scale collaborative SLAM with a distributed consensus filter on directed switching graphs [26] or other distribution methods to achieve an efficient and fast collaborative localization and mapping system.

Author Contributions

Conceptualization, J.X. and X.H. (Xiaofeng He); methodology, J.X., J.M. and X.H. (Xiaofeng He); software, J.X., X.H. (Xiaofeng He) and J.M.; validation, J.X., X.H. (Xiaofeng He) and J.M.; formal analysis, X.H. (Xiaofeng He); investigation, J.M.; resources, J.X., X.H. (Xiaofeng He) and J.M.; data curation, X.H. (Xiaofeng He) and J.M.; writing—original draft preparation, J.X. and J.M.; writing—review and editing, J.X. and J.M.; visualization, J.X.; supervision, X.H. (Xiaofeng He); L.Z. and X.H. (Xiaoping Hu); project administration, X.H. (Xiaofeng He) and L.Z.; funding acquisition, X.H. (Xiaofeng He) L.Z. and X.H. (Xiaoping Hu). All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Nature Science Foundation of China, grant number: 62103430, 62103427, and 62073331.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

We have used a public dataset EuRoc and cited relevant literature [25]. The testing zone dataset is not an Open source because we are doing some further research on the testing dataset.

Conflicts of Interest

The authors declare no conflict of interest.

References

Preiss, J.A.; Honig, W.; Sukhatme, G.S.; Ayanian, N. Crazyswarm: A large nano-quadcopter swarm. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May 2017–3 June 2017. [Google Scholar]
de Haag, M.U.; Huschbeck, S.; Huff, J. sUAS swarm navigation using inertial, range radios and partial GNSS. In Proceedings of the 2019 IEEE/AIAA 38th Digital Avionics Systems Conference (DASC), San Diego, CA, USA, 8–12 September 2019. [Google Scholar]
Alarifi, A.; Al-Salman, A.; Alsaleh, M.; Alnafessah, A.; Al-Hadhrami, S.; Al-Ammar, M.A.; Al-Khalifa, H.S. Ultra wideband indoor positioning technologies: Analysis and recent advances. Sensors 2016, 16, 707. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Cao, Y.; Beltrame, G. VIR-SLAM: Visual, inertial, and ranging SLAM for single and multi-robot systems. Auton. Robot. 2021, 45, 905–917. [Google Scholar] [CrossRef]
Nguyen, T.H.; Xie, L. Tightly-Coupled Single-Anchor Ultra-wideband-Aided Monocular Visual Odometry System. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May 2020–31 August 2020. [Google Scholar]
Karrer, M.; Schmuck, P.; Chli, M. CVI-SLAM—Collaborative Visual-Inertial SLAM. IEEE Robot. Autom. Lett. 2018, 3, 2762–2769. [Google Scholar] [CrossRef] [Green Version]
Schmuck, P.; Chli, M. CCM-SLAM: Robust and efficient centralized collaborative monocular simultaneous localization and mapping for robotic teams. J. Field Robot. 2019, 36, 763–781. [Google Scholar] [CrossRef] [Green Version]
Lajoie, P.-Y.; Ramtoula, B.; Chang, Y.; Carlone, L.; Beltrame, G. DOOR-SLAM: Distributed, Online, and Outlier Resilient SLAM for Robotic Teams. IEEE Robot. Autom. Lett. 2020, 5, 1656–1663. [Google Scholar] [CrossRef] [Green Version]
Barros, A.M.; Michel, M.; Moline, Y.; Corre, G.; Carrel, F. A Comprehensive Survey of Visual SLAM Algorithms. Robotics 2022, 11, 24. [Google Scholar] [CrossRef]
Mur-Artal, R.; Montiel, J.M.M.; Tardos, J.D. ORB-SLAM: A Versatile and Accurate Monocular SLAM System. IEEE Trans. Robot. 2015, 31, 1147–1163. [Google Scholar] [CrossRef] [Green Version]
Qin, T.; Li, P.; Shen, S. VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator. IEEE Trans. Robot. 2018, 34, 1004–1020. [Google Scholar] [CrossRef] [Green Version]
Li, Y.; Li, S.; Song, Q.; Liu, H.; Meng, M.Q.-H. Fast and Robust Data Association Using Posterior Based Approximate Joint Compatibility Test. IEEE Trans. Ind. Inform. 2014, 10, 331–339. [Google Scholar] [CrossRef]
Xie, J.; He, X.; Mao, J.; Zhang, L.; Han, G.; Zhou, W.; Hu, X. A Bio-Inspired Multi-Sensor System for Robust Orientation and Position Estimation. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September 2021–1 October 2021. [Google Scholar]
Cao, S.; Lu, X.; Shen, S. GVINS: Tightly Coupled GNSS-Visual-Inertial Fusion for Smooth and Consistent State Estimation. IEEE Trans. Robot. 2021, 38, 2004–2021. [Google Scholar] [CrossRef]
Tiemann, J.; Ramsey, A.; Wietfeld, C. Enhanced UAV Indoor Navigation through SLAM-Augmented UWB Localization. In Proceedings of the 2018 IEEE International Conference on Communications Workshops (ICC Workshops), Kansas City, MO, USA, 20–24 May 2018. [Google Scholar]
Zou, D.; Tan, P.; Yu, W. Collaborative visual SLAM for multiple agents: A brief survey. Virtual Real. Intell. Hardw. 2019, 1, 461–482. [Google Scholar] [CrossRef]
Bonin-Font, F.; Ortiz, A.; Oliver, G. Visual navigation for mobile robots: A survey. J. Intell. Robot. Syst. 2008, 53, 263–296. [Google Scholar] [CrossRef]
SungTae, M.; Kim, D.; Dongoo, L. Outdoor Swarm Flight System Based on the RTK-GPS. J. KIISE 2020, 47, 328–334. [Google Scholar]
Schmuck, P.; Ziegler, T.; Karrer, M.; Perraudin, J.; Chli, M. COVINS: Visual-Inertial SLAM for Centralized Collaboration. In Proceedings of the 2021 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), Bari, Italy, 4–8 October 2021. [Google Scholar]
Galvez-López, D.; Tardos, J.D. Bags of Binary Words for Fast Place Recognition in Image Sequences. IEEE Trans. Robot. 2012, 28, 1188–1197. [Google Scholar] [CrossRef]
Calonder, M.; Lepetit, V.; Strecha, C.; Fua, P. BRIEF: Binary Robust Independent Elementary Features. In Computer Vision—ECCV 2010; Springer: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
Lepetit, V.; Moreno-Noguer, F.; Fua, P. EPnP: An Accurate O(n) Solution to the PnP Problem. Int. J. Comput. Vis. 2009, 81, 155–166. [Google Scholar] [CrossRef] [Green Version]
Umeyama, S. Least-squares estimation of transformation parameters between two point patterns. IEEE Trans. Pattern Anal. Mach. Intell. 1991, 13, 376–380. [Google Scholar] [CrossRef] [Green Version]
Grupp, M. Evo: Python Package for the Evaluation of Odometry and SLAM. 2017. Available online: https://github.com/MichaelGrupp/evo (accessed on 21 September 2022.).
Burri, M.; Nikolic, J.; Gohl, P.; Schneider, T.; Rehder, J.; Omari, S.; Achtelik, M.W.; Siegwart, R. The EuRoC micro aerial vehicle datasets. Int. J. Robot. Res. 2016, 35, 1157–1163. [Google Scholar] [CrossRef]
Li, S.; Guo, Y. Distributed consensus filter on directed switching graphs. Int. J. Robust Nonlinear Control 2015, 25, 2019–2040. [Google Scholar] [CrossRef]

Figure 1. Illustration of C2VIR-SLAM: two agents use both the agent-to-agent range information measured by the onboard UWB, and the relative motion information measured by visual map matching to perform collaborative SLAM. The mutual information from the co-visibility and UWB range are indicated by the red and blue line, respectively. Green dots and black dots represent the map points of Agent 1 and Agent 2, respectively.

Figure 2. Overview of the C2VIR-SLAM architecture. Each agent runs a visual–inertial odometer (VIO) for poses and maps point estimation. The communication module sends the compressed keyframe’s (KF) information to the server. The server performs loop detection, map management, and global bundle adjustment. If the same place is recognized between two agents, or the agent-to-agent UWB range measurement is available, the server can optimize the motion parameters of all the participants and merge the local maps into a global map.

Figure 3. Illustration of the Testing Zone Dataset.

Figure 4. Illustration of the vision constraints (red lines) and the UWB constraints (blue lines) contributing to the EuRoC MH03 and MH05 two-agent collaboration. Green lines and yellow lines indicate the trajectory of MH03 and MH05, respectively.

Figure 5. Collaborative localization and mapping results of the Testing dataset; (a) Flight 1 in the Testing Zone dataset; (b) Flight 2 in the Testing Zone dataset.

Figure 6. (a) Illustration of the effective UWB edge and the co-vision edge in flight 1 in the Testing Zone dataset; (b) illustration of the effective UWB edge and the co-vision edge in flight 2 in the Testing Zone dataset.

Table 1. Sensor parameter Table of the Testing Zone Dataset.

Sensors	Model	Freq.	Parameters
Camera	BFS-U3-13Y3C-C	10 Hz	Resolution: 640 × 512 Pixels Focal Length: 8 mm
MIMU	GNW-MY540HE	200 Hz	Gyro Bias Stability: 8°/h Acc Bias Stability: 0.25 mg
UWB	LinkTrack LTP	25 Hz	Ranging Accuracy: <0.1 m Maximum Range: 500 m
GNSS	UBLOX-M8T	10 Hz	Positioning Accuracy: <1.5 m

Table 2. Key parameters of the Two Tested Datasets.

Datasets	Ssequence	Time [s] ¹	Path [m] ²	Camera View	Environment
EuRoc	MH01	182	81	Forward	Indoor, Industrial
	MH02	150	73
	MH03	131	131
	MH04	99	92
	MH05	111	97
Test Zone Dataset	Flight 1-Agent 1	282	271	Downward	Outdoor, Road network
	Flight 1-Agent 2	300	321
	Flight 2-Agent 1	230	427
	Flight 2-Agent 2	194	465

¹ Flight time in second. ² Path length in meters.

Table 3. The comparison of the localization error of two and three agents’ collaborative SLAM systems. The third column shows the results of the proposed C2VIR-SLAM with vision-based collaboration only, while the last column shows the results of the full C2VIR-SLAM. It also reports the percentage of the trajectory positioning error relative to the length of the trajectory.

Datasets	CVI-SLAM [6] RMSE [m]	C2VIR with Co-Visual Only RMSE [m]	Full C2VIR RMSE [m]
MH01 and MH02	0.139	0.0628 (0.0415%)	0.0570 (0.0376%)
MH02 and MH03	0.256	0.0882 (0.0437%)	0.0683 (0.0338%)
MH04 and MH05	0.34	0.1637 (0.0871%)	0.1552 (0.0826%)
MH01, MH02 and MH03	—	0.0802 (0.0285%)	0.0467 (0.0166%)
MH03, MH04, and MH05	—	0.1317 (0.0415%)	0.1041 (0.0328%)

Table 4. The experimental comparison results of the Testing Zone dataset.

Testing Zone Dataset	Agent	VINS-Mono RMSE [m]	C2VIR with UWB Only RMSE [m]	C2VIR with Co-Visual Only RMSE [m]	Full C2VIR RMSE [m]
Flight 1	Agent 1	2.2645 (0.8356%)	1.4944 (0.5514%)	1.5070 (0.5561%)	1.1797 (0.4353%)
Flight 1	Agent 2	2.5737 (0.8018%)	1.6281 (0.5072%)	1.6548 (0.5155%)	1.2600 (0.3925%)
Flight 2	Agent 1	3.1721 (0.7429%)	2.7416 (0.5514%)	2.8228 (0.6626%)	2.1590 (0.5068%)
Flight 2	Agent 2	3.3603 (0.9206%)	2.0385 (0.5072%)	2.6555 (0.5802%)	1.9115 (0.4176%)

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xie, J.; He, X.; Mao, J.; Zhang, L.; Hu, X. C2VIR-SLAM: Centralized Collaborative Visual-Inertial-Range Simultaneous Localization and Mapping. Drones 2022, 6, 312. https://doi.org/10.3390/drones6110312

AMA Style

Xie J, He X, Mao J, Zhang L, Hu X. C2VIR-SLAM: Centralized Collaborative Visual-Inertial-Range Simultaneous Localization and Mapping. Drones. 2022; 6(11):312. https://doi.org/10.3390/drones6110312

Chicago/Turabian Style

Xie, Jia, Xiaofeng He, Jun Mao, Lilian Zhang, and Xiaoping Hu. 2022. "C2VIR-SLAM: Centralized Collaborative Visual-Inertial-Range Simultaneous Localization and Mapping" Drones 6, no. 11: 312. https://doi.org/10.3390/drones6110312

Article Menu

C2VIR-SLAM: Centralized Collaborative Visual-Inertial-Range Simultaneous Localization and Mapping

Abstract

1. Introduction

2. Related Work

2.1. Single Robot SLAM

2.2. Multi-Robot Collaborative SLAM

3. Methods

3.1. Overview of the System

3.2. Single-Agent Visual–Inertial Odometer

3.3. Collaborative Localization with Pose Graph

3.4. Sequence Constraints from the Odometry

3.5. Map Matching Constraints from Visual Place Recognition

3.6. Relative Ranging from Onboard UWB

3.7. Map Refinement with Global Bundle Adjustment

4. Experiments

4.1. Datasets

4.2. Experimental Results

4.2.1. EuRoC Datasets

4.2.2. Testing Zone Dataset

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI