Accurate and Robust Rotation-Invariant Estimation for High-Precision Outdoor AR Geo-Registration

Huang, Kejia; Wang, Chenliang; Shi, Wenjiao

doi:10.3390/rs15153709

Open AccessArticle

Accurate and Robust Rotation-Invariant Estimation for High-Precision Outdoor AR Geo-Registration

by

Kejia Huang

¹

,

Chenliang Wang

^2,*

and

Wenjiao Shi

²

¹

SuperMap Software Co., Ltd., Beijing 100015, China

²

Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2023, 15(15), 3709; https://doi.org/10.3390/rs15153709

Submission received: 25 June 2023 / Revised: 18 July 2023 / Accepted: 23 July 2023 / Published: 25 July 2023

(This article belongs to the Special Issue Remote Sensing in Urban Positioning and Navigation)

Download

Browse Figures

Versions Notes

Abstract

:

Geographic registration (geo-registration) is a crucial foundation for augmented reality (AR) map applications. However, existing methods encounter difficulties in aligning spatial data with the ground surface in complex outdoor scenarios. These challenges make it difficult to accurately estimate the geographic north orientation. Consequently, the accuracy and robustness of these methods are limited. To overcome these challenges, this paper proposes a rotation-invariant estimation method for high-precision geo-registration in AR maps. The method introduces several innovations. Firstly, it improves the accuracy of generating heading data from low-cost hardware by utilizing Real-Time Kinematic GPS and visual-inertial fusion. This improvement contributes to the increased stability and precise alignment of virtual objects in complex environments. Secondly, a fusion method combines the true-north direction vector and the gravity vector to eliminate alignment errors between geospatial data and the ground surface. Lastly, the proposed method dynamically combines the initial attitude relative to the geographic north direction with the motion-estimated attitude using visual-inertial fusion. This approach significantly reduces the requirements on sensor hardware quality and calibration accuracy, making it applicable to various AR precision systems such as smartphones and augmented reality glasses. The experimental results show that this method achieves AR geo-registration accuracy at the 0.1-degree level, which is about twice as high as traditional AR geo-registration methods. Additionally, it exhibits better robustness for AR applications in complex scenarios.

Keywords:

quaternion; Real-Time Kinematic GPS; SLAM; augmented reality; geo-registration; sensor fusion

1. Introduction

The rapid development of augmented reality (AR) technology is transforming digital interactions by seamlessly integrating virtual information with the real world. In outdoor settings, AR technology has been widely applied in the field of geospatial information [1]. It connects map information and virtual graphical objects [2] with the real environments, providing more accurate, efficient, and intuitive interactive experiences. The application of AR technology in geospatial information holds significant value and importance. Firstly, AR geospatial applications make unmanned aerial vehicle (UAV) photography [3] efficient and reliable [4], enabling accurate positioning and navigation. Secondly, AR geospatial applications can offer intuitive navigation experiences. Whether it is locating destinations in urban areas [5], navigating for tourism purposes [6] and emergencies [7], or investigating complex outdoor environments [8], AR maps enhance navigation with higher accuracy, real-time performance, and convenience. Furthermore, AR technology demonstrates immense potential in the field of underground engineering construction. In underground pipeline laying [9] and mining surveys [10], AR systems provide a visual perception of the underground space, reducing errors, risks, and enhancing work efficiency.

However, achieving accurate AR map applications poses a key challenge, known as AR geographic registration (geo-registration). It involves precisely aligning virtual geographic information with real-world locations, forming the foundation for accurately displaying virtual elements of AR maps in the user’s real position.

As shown in Figure 1, the AR geo-registration system achieves accurate alignment and matching through the use of pose estimation and fusion. The system integrates input camera frame data, positioning data, orientation data, and motion-tracking data with GIS model data through multiple coordinate system transformations and updates. This integration enables the alignment of geospatial positions, surface registration on the terrain, and registration of AR map tracking. In AR geo-registration tasks, the camera or object viewpoint is determined based on pose estimation. This process enables the precise placement of virtual objects in the user’s real-world environment. Currently, there are three main methods for AR geo-registration: sensor-based, vision-based, and hybrid methods. These methods are classified based on the types of sensors used and the techniques used to fuse the data. However, these existing methods have limitations when dealing with the complexity and large-scale movements in outdoor environments, which restrict their effectiveness in practical applications. These methods are not as effective when dealing with complex and large outdoor environments. Current methods are limited in terms of the accuracy, stability, susceptibility to environmental interference, and cost, which hinders the achievement of desired outcomes. To overcome these challenges, it is necessary to conduct further research and explore novel approaches that can improve the precision, stability, and robustness of AR geo-registration. This will facilitate the broader application of AR in outdoor environments.

The previous work by the authors [11] investigated reliable AR applications within a small-scale range using local calibration techniques with inertial sensors and monocular cameras. However, this approach faces challenges in adapting to large-scale outdoor scenes that require continuous global calibration. It also fails to effectively eliminate alignment errors between geospatial data and the Earth’s surface or rectify sensor distortions. In this paper, a novel method is proposed for rotation-invariant estimation based on direction vectors and gravity vectors. By integrating highly accurate geospatial azimuth and surface estimation, a rotation matrix that is consistent with the georeference system is established. Additionally, RTK-GPS is used for high-precision positioning, and visual-inertial odometry is fused for pose estimation to create a translation matrix. This approach enables AR device cameras to align with the geographic north reference and the Earth’s surface, regardless of initial conditions or arbitrary motion directions. It also ensures consistent rotational relationships that are not affected by coordinate system transformations. Compared to existing methods, our approach offers the following innovations and advantages:

The accuracy of generating heading data from low-cost hardware is enhanced by utilizing RTK-GPS and visual-inertial fusion to calculate true-north direction vectors. This approach effectively improves the stability of geospatial azimuth estimation, enabling virtual objects to maintain accurate and stable orientation in complex environments. It enables the adaptation of virtual objects to the arbitrary motion of user devices in space.
A fusion method is employed to establish the initial attitude of the rigid body relative to the geographic north direction by combining the true-north direction vector and the gravity vector. By integrating the visual-inertial fusion method with the RTK-GPS multi-point positioning fusion method, the initialization error of the estimated rotation matrix in the visual-inertial fusion method is reduced. This effectively eliminates alignment errors between geospatial data and the ground surface, enabling accurate initial alignment of virtual objects with real-world features such as terrain and buildings.
The initial attitude relative to the geographic north direction is dynamically combined with the motion-estimated attitude using the visual-inertial fusion method. This allows for high-precision, multi-source coordinate system transformations, ensuring the spatial rotational invariance of the rigid body with respect to the geospatial reference frame. This approach effectively reduces the precision requirements and costs of the global pose for the visual-inertial fusion sensor and the high-frequency pose for RTK-GPS. As a result, it can be applied to various AR precision systems, including smartphones and AR glasses.

The remaining sections of this paper are organized as follows: Section 2 provides an overview of various attitude estimation methods for AR geo-registration. It analyzes the characteristics of existing techniques and highlights the key differences between this method and other approaches. Section 3 presents a detailed description of the proposed rotation-invariant estimation method based on direction vectors and gravity vectors. In Section 4, a series of quantitative and qualitative experiments are conducted to validate the geo-registration accuracy and robustness of the proposed method in different scenarios. The experimental results have been analyzed and discussed. Section 5 summarizes the key innovations and benefits of this paper, addresses the limitations and drawbacks of the proposed method, and outlines future research directions and potential improvements in geo-registration.

2. Related Work

In augmented reality applications, geo-registration refers to the process of aligning and matching virtual objects with the geographic location and orientation of the real-world scene. Currently, there are three common methods for pose estimation: sensor-based approaches, vision-based approaches, and hybrid approaches. These methods have been extensively applied in numerous projects and research endeavors. However, pose estimation in outdoor scenarios still faces numerous challenges. Factors such as signal interference, environmental variations, lighting changes, feature scarcity, occlusions, and dynamic objects severely impact the accurate determination of the geographic north orientation, alignment with the terrain surface, and precision in coordinate system transformations. These factors can lead to cumulative errors in pose estimation.

2.1. Sensor-Based Methods

Localization in outdoor industrial environments typically utilizes sensors such as GPS and magnetometers to obtain spatial coordinates and orientation [12]. Sensor-based approaches utilize non-visual sensors such as GPS, IMU, and magnetometers to acquire users’ positional and directional information [13]. Subsequently, virtual information is generated based on geospatial databases and aligned with the real environment. These methods primarily rely on built-in, non-visual sensors (e.g., accelerometers, gyroscopes, magnetometers) to obtain the azimuth and tilt angles of the device [14,15]. By fusing this information with GPS localization data, they collectively provide position and orientation estimation. These sensors provide information on linear acceleration, angular velocity, and magnetic field strength. By using filtering and attitude algorithms, the data can be used to infer the pose of the device. Sensor-based approaches are characterized by their low cost, low complexity, and reasonable continuity, making them suitable for simple AR application scenarios. Behzadan and Kamat [2] demonstrated GPS’s effective use for real-time virtual graphics registration in an outdoor AR setting. Accelerometers and gyroscopes can provide accurate and robust attitude initialization, as demonstrated by Tedaldi et al. [16]. However, these sensors also have limitations, such as drift and noise over time [17]. Moreover, the heading angle measured by a magnetometer, which is often integrated with accelerometers and gyroscopes, is susceptible to significant influence from ambient magnetic field noise [18].

An Inertial Measurement Unit (IMU) offers a fast response and update rate for capturing motion changes, but it is vulnerable to noise, drift, and measurement range limitations, affecting the accuracy and stability of AR applications [19]. This will cause deviations or oscillations in the orientation of virtual objects. Consequently, the accuracy and stability of aligning with the Earth’s surface are affected. Furthermore, cumulative errors may arise in spatial rotations, and the conversion between coordinate systems cannot guarantee the spatial rotational invariance of objects relative to the geographic reference frame. RTK-GPS offers centimeter-level positioning accuracy and works well during high-speed motion. However, it may encounter failures and initialization challenges in certain outdoor AR map scenarios [20]. Generally, the distance between the mobile station and the reference station should not exceed 10 km to 15 km, as it could impact the positioning accuracy or even lead to failure [21]. Thus, outdoor localization using consumer-grade sensors is a challenging problem [22] that requires a method that can combine the strengths and overcome the weaknesses of different sensors.

By combining the measurements from RTK-GPS, including the position, velocity, and heading angle, with the IMU data, improved outcomes can be achieved at a lower cost. Moreover, IMUs have a high output frequency and enable the measurement of the six degrees of freedom (6DOF) pose. This feature makes them ideal for short-term applications in environments with limited visual texture and rapid motion [23]. The distinctive characteristics of IMUs make them a valuable complement to RTK-GPS signals, leading to their widespread adoption and extensive investigation in conjunction with RTK-GPS integration [24,25]. However, one limitation that arises is the zero offset of the accelerometer and the gyroscope, which leads to a significant pose offset over time [26]. Moreover, when employing low-accuracy (consumer-grade) RTK-GPS in conjunction with a highly accurate IMU, the impact of north positioning errors becomes significant during substantial changes in altitude [27]. Furthermore, the pose estimation approach that combines RTK-GPS and IMU is limited to open areas because it relies on satellite availability [28].

Due to potential deviations between sensor data, environmental interference or drift can destabilize the azimuth estimation of such methods, as they rely on sensor data such as magnetometer readings. Sensor-based approaches are prone to error accumulation. The estimation of position and pose using sensor data, such as accelerometer and gyroscope data, requires integration and filtering, which exacerbates the issue of error accumulation. Those issues make sensor-based approaches suffer from insufficient accuracy and stability in both position alignment and map motion tracking.

2.2. Vision-Based Methods

An alternative approach is the utilization of vision-based methods. These methodologies can be considered as a specific instance within the broader category of sensor-based approaches, where the camera functions as the primary sensor. Different from the conventional sensor method, vision-based approaches utilize cameras to capture real-world images. These images are then processed using techniques such as feature extraction, matching, and tracking to detect and match feature points within the environment. By analyzing the positional changes of these feature points, it is possible to estimate the device’s pose, the user’s location and orientation information, and the geographical positioning within the real environment. This allows for the alignment between virtual information and the real world [29]. With the advancement of spatial data acquisition technologies, recent studies have focused on geographic localization and registration using multi-source data, including satellite imagery and video data [30]. However, these methods are both complex and expensive. On the one hand, the accuracy of the heading angle in certain localization methods that rely on multi-source data may not be appropriate for outdoor, wide-range AR applications [31]. On the other hand, their wide application is challenging because they require a substantial number of pre-matched georeferenced images or a large database of point clouds captured from the physical world [22]. A more efficient approach is to employ automated computer vision techniques to generate three-dimensional point clouds for positioning [32]. However, this method requires pre-generation of point clouds, which primarily relies on visual features. Camera-based pose estimation methods involve capturing images of the environment using cameras and utilizing computer vision algorithms to infer the position and orientation of the device. These methods typically involve feature point detection, camera calibration, and visual geometric algorithms, which in turn require significant computational resources and algorithmic complexity. Even with map motion tracking employed, they still result in a slow system response due to the heavy computation load required for feature recognition, localization, and mapping.

Visual methods are better suited for pose estimation in local scenes because the camera is primarily used as a local sensor in these methods [33]. In outdoor global geo-registration and pose estimation, the camera is susceptible to environmental interferences, resulting in degraded image quality or the loss of feature points. The influence of blurred visual features on the approach becomes increasingly noticeable as the velocity of motion increases. As a consequence, the visual data may not be able to provide stable and reliable orientation estimation. The SIFT feature algorithm, for instance, is considered a relatively reliable visual feature algorithm [34]. However, it requires the computation of key point orientations, which can be influenced by noise and lighting conditions. Furthermore, the SIFT feature algorithm is limited to handling small-angle rotations and may fail when dealing with large-angle rotations. Neural network-based methods exhibit stronger capabilities in extracting image features compared to traditional visual feature algorithms. However, they require a large amount of training data, and in the case of rare or novel objects, there may not be enough data to achieve satisfactory generalization capabilities [31]. In summary, visual methods have limitations in global position alignment and require abundant feature dependencies and motion constraints from MAR (Motion and Attitude Reference) devices for surface alignment.

2.3. Hybrid Method

Single sensors and vision alone are insufficient to achieve robust, accurate, and large-scale 6-DOF localization in complex real-world environments. Hybrid methods for pose estimation integrate non-visual sensor and visual information to obtain more accurate and stable results in estimating pose. To achieve a robust and accurate outdoor geo-registration system, it is necessary to fuse multiple complementary sensors. By leveraging the complementary advantages of sensors and visual data, the limitations of individual methods can be overcome, particularly in demanding outdoor applications such as urban and highway environments [35].

The visual-inertial fusion method has become popular for positioning and navigation, especially in GPS-denied environments [36], due to the complementary nature of vision and IMU sensors [37]. By integrating vision and IMU sensors [38], this method overcomes the limitations of using either vision or IMU sensors alone [39,40]. Vision sensors can be affected by factors such as lighting, occlusion, and feature matching, while visual localization methods rely on a large amount of georeferenced or pre-registered data. IMU, on the other hand, suffers from issues such as accumulated errors and zero drift. By comparing and calibrating the attitude estimation of the IMU with visual data, it is possible to achieve more accurate pose estimation. Common methods for this fusion include Extended Kalman Filtering (EKF) and Tightly Coupled Filtering. The fusion of the camera and the IMU sensor can solve the problems of low output frequency and the accuracy of visual pose estimation, enhancing the robustness of the positioning results.

However, current multi-sensor fusion methods have limitations. The advantage of multi-sensor fusion is its adaptability and stability, but its implementation is also more complex. These methods fail to simultaneously meet the requirements of high-precision initial and motion pose estimation with a low-cost solution. Although these methods may provide cost advantages, they have limitations in achieving both accurate pose estimation and cost-effectiveness. There is a trade-off and inherent constraints between low-cost solutions and high-precision requirements. Ren et al. [3] achieved geo-registration in low-cost UAVs by fusing RTK-GPS and the IMU sensor. However, the limited accuracy of the IMU sensor in these UAVs led to imprecise attitude fusion. Their method heavily relies on the stability of RTK-GPS data. Burkard and Fuchs-Kittowski [22] estimated the gravity vector and geographic north in visual-inertial fusion registration through user gesture calibration. However, the accuracy of the registration relies on manually input information. Oskiper et al. [41] utilized road segmentation direction information and annotated remote sensing image data in their visual-inertial method to achieve accurate global initial registration. However, their performance in pose matching may degrade under outdoor continuous motion and spatial rotation. Hansen et al. [42] proposed a precise method for estimating positioning and orientation using LiDAR, an intelligent IMU with high accuracy, and a pressure sensor that can measure altitude. However, the high cost of these devices prevents widespread adoption. Multi-sensor fusion is primarily hindered by the lack of precision in surface alignment.

In summary, existing methods for outdoor position alignment, surface registration, and motion tracking have several issues and shortcomings. To address these concerns, minimizing rotation matrix errors can reduce the accumulation of position errors and improve the precision of surface registration and motion tracking. Lowering errors in rotation matrix errors requires considering multiple factors, including sensors, feature-matching algorithms, and optimization methods. These efforts aim to improve the effectiveness of position registration, surface registration, and map motion tracking. As a result, augmented reality maps have experienced notable advancements in terms of realism and interactivity. The significance of the spatial rotation invariance of the geographic reference in outdoor AR geo-registration can be summarized as follows:

Matching of actual geographic surface: Maintaining the orientation and angles of virtual objects relative to real terrain features, enabling high-precision alignment between the two, and enhancing the realism and immersion of AR maps.
Stability of geo-registration for virtual spatial objects: Ensuring that virtual objects remain stable in their orientation and rotational state when observed by users, avoiding jitter, drift, and other phenomena, thereby enhancing the user experience.
Accuracy of geographic orientation estimation: Leveraging information such as true-north direction vectors and gravity vectors, along with the rotational invariance of the geographic reference, to improve the estimation process of pose initialization and update fusion, reducing initialization errors and accumulated errors in attitude estimation, and improving the accuracy and stability of the results.
Precision of coordinate transformation: By combining high-precision position data from RTK-GPS and the pose estimation from visual-inertial fusion methods, constructing translation matrices and rotation-invariant matrices, achieving high-precision coordinate transformation, and reducing errors in coordinate system conversions in AR maps.

Previously, the authors’ research achieved reliable AR applications within a small range through the local calibration technique of inertial sensors and monocular cameras [11]. Building upon this previous work, this paper proposes a novel rotation estimation method based on direction vectors and gravity vectors, aiming to achieve high-precision geo-registration in AR map applications. This method effectively addresses the alignment errors between geographic data and the Earth’s surface, as well as the instability of geographic north estimation faced by existing methods in complex outdoor scenarios with complex terrain and high-frequency motion updates. It improves the accuracy and robustness of AR geo-registration. The proposed method not only fills the gaps in existing approaches for addressing the research problem in this paper but also presents a novel and effective solution. The superiority of the proposed method is demonstrated through both theoretical analysis and experimental evaluations.

3. Methods

3.1. Overview of the Rotation-Invariant Estimation Approach

In geospatial registration of augmented reality, it is necessary to convert between the camera coordinate system and the AR world coordinate system. This involves using matrices to perform rotation and translation operations that describe the orientation and position of the AR camera in space. The rotation matrix and translation vector represent changes in the camera’s motion coordinates. The rotation matrix preserves the length and angle of vectors within a coordinate system, enabling rotation transformations in space. The translation vector describes the camera’s position offset in space, ensuring accurate positioning in geographic space. Transformation matrices enables the camera to quickly respond to user actions and shifts in perspective for precise alignment between virtual information and the physical environment. In order to improve the efficiency and accuracy of calculating transformation matrices, it is important to investigate new methods and optimization techniques. This will enable more stable and accurate integration of virtual content into the real environment. Maintaining rotational invariance across multiple coordinate systems is a research challenge. This paper explores the use of real-time computed rotation-invariant matrices to maintain the consistency and stability of virtual objects across various coordinate systems.

In this study, a novel rotation estimation method based on the geographic north vector and the gravity vector is proposed, utilizing a multi-sensor fusion approach to achieve high-precision AR geo-registration. The method, by preserving direction and position consistency, effectively enhances the accuracy and stability of surface alignment and true-north direction alignment, ensuring precise alignment and positioning of virtual content with the real world in augmented reality applications. To accomplish this, a combination of RTK-GPS data, visual data, and IMU data was employed to estimate the true-north heading, planar heading, and pose heading angles of the AR device in the geographic coordinate system. Based on these estimations, a rotation-invariant matrix was constructed that enables accurate rotation transformation and coordinate conversion between the geographic coordinate system and the camera coordinate system of the MAR device. This method establishes a fundamental basis for AR-GIS scene reconstruction and augmentation. As depicted in Figure 2, the overall framework of this method consists of four main steps, outlined below.

Estimation of geographic north orientation: The true-north heading angle was calculated using the RTK-GPS and visual-inertial fusion. A true-north direction vector was then constructed as the reference for the geographic coordinate system.
Extraction of gravity vector: The RANSAC algorithm was employed to extract the gravity vector from the plane equation of the Earth’s surface. This gravity vector serves as an initial constraint for the visual-inertial fusion method.
Attitude construction and fusion: The initial attitude was constructed by combining the true-north direction vector and the gravity vector. This enables the fusion of high-frequency, high-precision relative attitude heading angles from the visual-inertial fusion method with the high stability and accuracy of the global attitude heading angle from the RTK-GPS. Furthermore, a continuous rotation-invariant matrix was constructed to reduce the initialization error and the cumulative error in estimating the rotation matrix of the visual-inertial fusion method.
Coordinate system transformation: A translation matrix was constructed by fusing the high-precision position from the RTK-GPS with the pose estimation of the visual-inertial fusion method. This, combined with the rotation-invariant matrix, enables high-precision coordinate conversion and reduces errors in complex coordinate transformations for AR mapping.

As shown in Figure 3, this study focuses on three primary coordinate systems [11]:

The AR world coordinate system is responsible for determining the camera’s position on the AR device in the real world. Each point in this system corresponds to a specific location in the physical world. It is introduced by the camera and can represent various objects, with units typically in meters (m).
The AR camera coordinate system is defined with the optical center of the camera as the origin. The z-axis aligns with the optical axis, which points forward from the camera and is perpendicular to the imaging plane. The positive directions of the x- and y-axes are parallel to the object’s coordinate system. The units are typically in meters (m).
The AR geographic coordinate system is a coordinate system that maps the entire Earth, with the z-axis pointing towards the Earth’s center and aligned with the gravity vector.

In this study, the geographic coordinate system was first transformed into a plane projection coordinate system and then further transformed into the local tangent plane (LTP) coordinate system [43]. The origin of the LTP coordinate system represents real-world locations and can be scaled to match the real world, enabling flexible transformation between the LTP and the AR world coordinate systems. The AR world coordinate system and the AR camera coordinate system were transformed in real time using the pinhole camera model. This transformation was achieved by combining the rotation matrix and the translation vector with the calibration parameters of the AR camera.

3.2. Calculation of True-North Direction Vector Based on RTK-GPS and Visual-Inertial Fusion

The true-north heading angle refers to the rotation angle in the horizontal plane of a device relative to the Earth’s coordinate system. Typically, this angle is determined using sensors such as magnetometers or GPS. However, variations in the magnetic field or the unavailability of GPS can result in an inaccurate perception of the true-north heading angle. In this study, a method is proposed for constructing the true-north heading angle based on multi-point orientation using a low-cost RTK-GPS device.

We assumed that the coordinates of two points in the geographic coordinate system were A(

x_{1}

,

y_{1}

) and B(

x_{2}

,

y_{2}

). The horizontal and vertical coordinates of the vector between these two points were (

x_{2} - x_{1}

,

y_{2} - y_{1}

). By applying vector conversion, the magnitude of the vector, denoted as

d

, and the geographic bearing angle between the two points in the geographic coordinate system can be calculated. The geographic bearing angle represents the direction from A to B, with the reference line being the true north (i.e., the Y-axis). The angle was computed in a clockwise direction, ranging from 0 to 360 degrees. The arc length was calculated based on the high-precision latitude and longitude of two points using geodetic positioning. The curvature radius was assumed to be the Earth’s radius, and the radians were obtained by inverse calculation based on the arc length. The formula for computing the true-north heading angle using multi-point orientation with RTK-GPS is as follows:

\begin{matrix} \cos α_{G E O} = \frac{x_{2} - x_{1}}{d} \\ \sin α_{G E O} = \frac{y_{2} - y_{1}}{d} \\ α_{G E O} = \tan^{- 1} \frac{\sin θ}{\cos θ} \end{matrix}

(1)

The geographic bearing angle between A and B in the AR scene was calculated by visual-inertial fusion. This bearing angle refers to the angle of the line segment connecting A and B with respect to the azimuth at the initiation of the MAR device. The calculation method is as follows.

As shown in Figure 4, the fusion of the two coordinate systems was accomplished. The AR scene of the MAR device was constrained to the geospatial scene based on RTK-GPS within the same camera frame during initialization, which can be achieved in the AR software (ARCore version 1.2) system using the same timestamp. The difference in azimuth angles calculated from the two coordinate systems represents the angle

θ

between the MAR device’s orientation at startup and the geographic true north. The calculation of

θ

is as follows:

\begin{matrix} θ = α_{G E O} - α_{A R} \\ = 90 - \arctan \frac{Δ y}{Δ x} - 90 + \arctan \frac{Δ y'}{Δ x'} \\ = \arctan \frac{Δ y'}{Δ x'} - \arctan \frac{Δ y}{Δ x} \end{matrix}

(2)

where:

α_{A R} = 90 - \arctan \frac{Δ y'}{Δ x'}

(3)

where

Δ x

and

Δ y

represent the differences in the x- and y-directions of the geographic coordinates between A and B, while

Δ x'

and

Δ y'

represent the differences in the x- and y-directions, respectively, between the coordinates of A and B in the AR system. Moreover, the value of

θ

remained a constant value after the completion of initialization in the MAR device and was not influenced by real-time motion tracking.

Utilizing Equation (2), it was possible to accurately calculate the geographic bearing angles between multiple points on various low-cost devices. When points A and B were relatively close to each other, the angle

θ

exhibited significant errors compared to the angle between line segment AB and the true geographic north. Conversely, when points A and B were farther apart, the two angles were approximately equal. Therefore, the offset angle

θ

was influenced by the distance between the two points. According to Equation (1), when the positioning accuracy of RTK-GPS reached 2 cm (a common accuracy level for consumer-grade RTK-GPS), the azimuthal error of the MAR device in the geographic coordinate system did not exceed 0.23 degrees for any two positioning combinations at distances exceeding 10 m. Similarly, for distances exceeding 20 m, the azimuthal error did not exceed 0.11 degrees. This methodology is suitable for common applications involving the calculation of AR map direction vectors.

Based on the obtained geographic bearing angle

θ

from Equation (1) and the rotation axis vector N, the quaternion

Q_{north}

was constructed. The calculation formula is as follows:

\begin{matrix} Q_{north} = \cos (\frac{θ}{2}) + \sin (\frac{θ}{2}) \times N_{1} i + \sin (\frac{θ}{2}) \times N_{2} j + \sin (\frac{θ}{2}) \times N_{3} \times k \end{matrix}

(4)

where i, j, and k are the imaginary unit vectors of the quaternion, and N₁, N₂, and N₃ represent the components of the rotation axis vector N along the x-, y-, and z-axes, respectively.

3.3. Extraction of Gravity Vector by Solving the Ground Surface Plane Equation Using the RANSAC Algorithm

Plane estimation refers to the determination of the position and orientation of a device relative to a specific plane in the outdoor environment, such as the ground or a wall. This information is typically provided by sensors such as visual trackers or LiDAR. However, due to the irregularity of outdoor ground structures, plane estimation often suffers from instability and imprecision. In this paper, a method for computing the gravity vector is proposed, which utilizes RANSAC-based plane recognition and quaternion computation techniques. Our approach does not rely on easily disturbed devices such as magnetometers and does not require pre-deployment of image markers such as QR codes. As a result, it improves the accuracy and robustness of gravity vector estimation. The RANSAC algorithm [44], a statistical parameter estimation method, was employed to identify the optimal model parameters from a set of noisy data. In our surface-scanning and recognition process, the RANSAC algorithm was used to randomly sample feature point cloud data points and compute the parameters of the plane model for plane identification.

Firstly, the ground plane of the target area was obtained using the aforementioned RANSAC algorithm. Assuming that a set of points on the plane were identified through the RANSAC algorithm applied to randomly sampled feature point cloud data, the normal vector F(A,B,C) of the plane was computed by calculating the cross-product between three points on the plane. The least squares method was then used to fit the plane equation:

\begin{matrix} A x + B y + C z + D = 0 \end{matrix}

(5)

where A, B, and C are the coordinates of any three points on the plane, and D is the distance from the plane to the origin. The formula can be obtained by substituting any point C into the plane equation and solving:

\begin{matrix} D = - (A x + B y + C z) \end{matrix}

(6)

This expression can be used to determine whether the points generated by the MAR device’s visual-inertial navigation system during subsequent feature point cloud extraction lie on the plane. By substituting the coordinates of the points into Equation (6), if the result was 0 or smaller than the tolerance of the plane estimation, it indicates that the point lies on the plane. If the result was greater than the tolerance of the plane estimation, another RANSAC plane recognition process needs to be performed.

Next, the direction of the normal vector F was taken as the negative direction of the Y-axis of the ground plane. Based on the normal vector F, a quaternion

Q_{gravity}

was constructed, which represents a rotation matrix aligning the detected plane with the negative direction of the Y-axis. This quaternion represents the rotation around the normal vector and can be expressed as follows:

\begin{matrix} Q_{gravity} = a + b i + c j + d k \end{matrix}

(7)

where a represents the real part, and b, c, and d represent the imaginary parts. With the constructed gravity vector, the geographical direction obtained from the previous section through RTK-GPS multi-point positioning was used as the reference baseline to construct a continuous rotation matrix for the Earth reference coordinate system. This enabled the continuous alignment between the viewport display of the AR device and the prior map data.

In some cases, the gravity vector may not be strictly perpendicular to the ground. This issue was addressed using the following approach: The gravity vector was extracted from the smartphone’s accelerometer sensor data to participate in the computation of the rotational invariance matrix. In the subsequent experiments, a test environment with surface normal vectors close to the gravity vector was also selected.

3.4. Attitude Construction and Fusion

3.4.1. Initialization of Quaternion Attitude Based on True-North Vector and Gravity Vector

As mentioned earlier, this study is innovative in two aspects: eliminating alignment errors between geographical data and the surface and improving the stability of estimating the geographic north direction, thereby enhancing the accuracy and robustness of AR map registration. However, to achieve this geographical registration, it was necessary to address the problem of initializing the initial rotation matrix from the true-north vector and the gravity vector, which is known as the attitude initialization problem.

As shown in Equations (4) and (7), both quaternions

Q_{gravity}

and

Q_{north}

represent rotations around the surface normal vector.

Q_{gravity}

is a quaternion constructed using the plane equation derived from the RANSAC algorithm, which represents the direction of the gravity vector of the MAR device on the surface.

Q_{north}

was obtained through the RTK-GPS multi-point orientation method, converting the heading angle to a geographic direction vector. The multiplication of these two quaternions constructed the orientation rotation direction of the MAR device on the Earth′s surface, rotating around its normal vector by an angle

θ

. According to the quaternion multiplication definition, the multiplication was performed on the imaginary parts, while the real parts underwent geometric addition and subtraction operations:

\begin{matrix} \begin{matrix} Q_{gravitynorth} = Q_{gravity} \times \end{matrix} Q_{north} \\ = (w_{1} w_{2} - x_{1} x_{2} - y_{1} y_{2} - z_{1} z_{2}) + (w_{1} x i_{2} + w_{2} x i_{1} + y_{1} z k_{2} - y_{2} z k_{1}) + \\ (w_{1} y j_{2} + w_{2} y j_{1} + z_{1} x i_{2} - z_{2} x i_{1}) + (w_{1} z k_{2} + w_{2} z k_{1} + x_{1} y_{2} + x_{2} y_{1}) \end{matrix}

(8)

where the quaternion

Q_{gravitynorth}

represents the direction quaternion obtained by rotating around the normal vector F(A,B,C) on the Earth′s surface by an angle

θ

. It represents the orientation of the MAR device, where the XOY plane was perpendicular to the surface and the orientation was rotated 0 degrees relative to the geographic north. Using this quaternion, the plane can be aligned with any direction on the Earth’s surface.

The above equations encompass the main calculations for the entire process of attitude initialization, which involves constructing the initial attitude using the true-north direction vector and the gravity vector. However, the initialization process relies on the quality of the RTK-GPS signal, and if the signal is unstable or lost [20], it can impact the accuracy and reliability of AR geolocation.

Therefore, this paper proposes a more robust initialization process (as shown in Figure 5) that can utilize magnetic sensors to obtain the true-north heading angle and calculate the corresponding quaternion

Q_{north}

in case of poor or interrupted RTK-GPS signals.

3.4.2. Continuous Rotation-Invariant Matrix Based on True-North Direction Vector and Gravity Vector

The previous sections described the acquisition of quaternions based on the direction of the gravity vector and the true-north direction on the surface of the Earth, and the construction of the initial orientation based on these quaternions. This section focuses on the construction method of a continuous rotation-invariant matrix based on the true-north direction vector and the gravity vector. The purpose of this method is to establish precise rotation transformations and coordinate conversions between the geographic coordinate system and the camera coordinate system for the MAR device.

Here, a method is proposed based on the composition of three-dimensional rotation matrices to construct a rotation matrix where the rotation axis is parallel to the gravity vector and the initial rotation angle is parallel to the true-north direction (referred to as

R_{realtimegn}

). First, the quaternion Q_gravitynorth was obtained using Equation (6) and converted into a rotation matrix (referred to as

R_{gravitynorth}

) that represents the orientation of the coordinate system at the initialization of the MAR device. By multiplying the two coordinate system orientations,

R_{realtimegn}

and

R_{gravitynorth}

, a new coordinate system orientation was obtained, denoted as

R_{vio}

. This orientation directly transformed the coordinate system orientation of the MAR device from the true-north direction at 0 degrees, with XOY perpendicular to the ground surface, to the coordinate system orientation estimated online by the visual-inertial odometry fusion method during the motion of the MAR device. This computation enabled the calculation of the final coordinate system orientation estimated online during the motion of the MAR device:

\begin{matrix} R_{vio} = R_{realtimegn} \times R_{gravitynorth} \end{matrix}

(9)

where the rotation matrix

R_{gravitynorth}

was obtained through the fusion of high-precision RTK-GPS with the RANSAC algorithm, while

R_{vio}

was computed through real-time motion tracking and registration using the visual-inertial odometry fusion method. Since all three matrices are rotation matrices, the inverse of

R_{gravitynorth}

was equal to its transpose, i.e.,

R_{g r a v i t y n o r t h}^{- 1} = R_{g r a v i t y n o r t h}^{T}

. Multiplying both sides by the inverse of

R_{gravitynorth}

yielded the following computation formula:

\begin{matrix} \begin{matrix} R_{vio} \times R_{g r a v i t y n o r t h}^{- 1} = R_{realtimegn} \times R_{gravitynorth} \times R_{g r a v i t y n o r t h}^{- 1} \\ \to R_{realtimegn} = R_{vio} \times R_{g r a v i t y n o r t h}^{- 1} \end{matrix} \end{matrix}

(10)

To achieve coordinate transformation between the two coordinate systems, a matrix representing a three-dimensional vector was constructed in this study, from which new coordinate values were extracted. We assumed that vector V had coordinates [

X_{w}, Y_{w}, Z_{w}

] in the MAR device’s initial coordinate system and needed to be rotated from the initial orientation to the final orientation, where the geographic north was at 0 degrees and the XOY plane was perpendicular to the Earth’s surface. First, the vector was represented as a matrix [

X_{w}

,

Y_{w}

,

Z_{w}

, 1]. Then, this matrix was multiplied with the rotation matrices

R_{vio}

and

R_{realtimegn}

, resulting in an intermediate vector

M

and the final vector

E

. The computation formulas for these transformations are as follows:

\begin{matrix} \begin{matrix} M = R_{gravitynorth} \times Q \\ E = R_{realtimegn} \times M \end{matrix} \end{matrix}

(11)

The first three elements of the final vector

E

represent the coordinates of the transformed vector. These coordinates can represent the AR world coordinate system of the MAR device, which, based on the continuous and rotation-invariant matrices, was transformed to the coordinates in the rectangular coordinate system corresponding to the AR map.

3.5. Coordinate Transformation Based on Rotation-Invariant Matrices

3.5.1. Transformation from AR Geographic Coordinate System to AR World Coordinate System

This section primarily discusses the conversion between the geographic coordinate system and the AR world coordinate system in AR georeferencing. The AR world coordinate system is a virtual coordinate system used to describe the position and orientation of AR devices and virtual content. Its origin and orientation are typically determined during device initialization and do not have a fixed correspondence with the real geographic coordinate system. In AR georeferencing, coordinate system conversion and rotation transformations are required to align virtual content with the geographic location and orientation in the real world.

Traditional methods often employ planar projection techniques, selecting suitable zone projection transformations for the target area to project the coordinates of the geographic coordinate system onto a planar rectangular coordinate system, and then converting that planar rectangular coordinate system into the AR world coordinate system. However, this approach presents a challenge: the origin of the planar projection coordinate system is often far from the origin of the MAR device’s world coordinate system. This leads to significant errors in displacement vector and rotation matrix calculations on the actual Earth’s surface, thereby failing to meet the requirements of high-precision AR georeferencing.

To address this issue, this paper proposes a method for transforming the geographic coordinate system to the AR world coordinate system based on an invariant translation transformation matrix.

First, after successful initialization of the MAR device, high-precision RTK-GPS coordinates (X_g, Y_g, Z_g) were obtained. These coordinates were then transformed into the coordinates (X_l, Y_l, Z_l) in a planar rectangular coordinate system, facilitating the conversion of the geographic coordinate system, C_g, of the AR map to the planar rectangular coordinate system, C_p. Since the AR world coordinate system is also a planar rectangular coordinate system, the transformation between C_p and the AR world coordinate system follows rigid-body motion principles, with the coordinates measured in meters. The coordinates (X_l, Y_l, Z_l) serve as the initialization origin for displaying the AR device scene.

Furthermore, considering the high relative attitude accuracy and frequent pose estimation of the visual-inertial fusion method, our objective was to obtain the relative displacement of geographic data aligned with the Earth’s surface. In this study, precise displacement values were calculated by solving the anchor points of RTK-GPS coordinates. The surface plane equation was extracted using the RANSAC algorithm, and spatial anchor points for this plane equation were generated using the visual-inertial fusion method. These spatial anchor points are represented as

P_{a n r}

=

[X_{anr}, Y_{anr}, Z_{a n r}, \emptyset_{a n r}, θ_{a n r}, φ_{a n r}]

. With these spatial anchor points, a mapping relationship can be established between the Earth’s surface, the MAR device’s initial position, and the MAR device’s real-time online estimated position.

By utilizing the visual-inertial fusion method, the coordinate offset between the real-time online estimated position of the MAR device and its initial position was obtained to construct a translation transformation matrix, denoted as

T_{cp2wd}

. The initial position of the visual-inertial fusion method is represented as (X_vinit, Y_vinit, Z_vinit), and the calculation formula for this transformation matrix is given below:

\begin{matrix} \begin{matrix} T_{cp2vio} = [\begin{matrix} (X_{anr} - X_{l}) \\ (Y_{anr} - Y_{l}) \\ (Z_{a n r} - 0) \end{matrix}] \\ T_{cp2wd} = [\begin{matrix} (X_{vrealtime} - X_{anr}) \\ (Y_{vrealtime} - Y_{anr}) \\ (Z_{vrealtime} - Z_{a n r}) \end{matrix}] + T_{cp2vio} \end{matrix} \end{matrix}

(12)

where (X_vrealtime, Y_vrealtime, Z_vrealtime) represents the coordinates estimated and computed in real time by the visual-inertial fusion method.

T_{cp2vio}

is the fixed distance between the spatial anchor point

P_{a n r}

on the ground surface and the coordinates (X_l, Y_l, Z_l) at the initial position in the AR world coordinate system. To align the plane rectangular coordinate system of the AR map with the ground surface, it was necessary to adjust the Z-axis in

T_{cp2vio}

to match the position of

P_{a n r}

.

Next, the visual-inertial fusion method and a direction vector with geographic north were utilized to obtain the rotation matrix, representing the real-time motion attitude of the MAR device with respect to the initial attitude of the AR device. This rotation matrix was used to align the real-world attitude corresponding to the AR world coordinate system with the virtual-world attitude of the AR map.

An LTP coordinate system, denoted as L₁, was constructed by combining the direction vector with geographic north and the central origin of the ground surface plane equation. This coordinate system is locally defined with a north angle of 0 degrees and the XOY plane perpendicular to the horizontal ground surface. The transformation involved in this part included converting the attitude matrix of L₁ to the attitude matrix at the initialization of the visual-inertial fusion method, and then further transforming it to the attitude matrix in the AR world coordinate system in real time for the MAR device.

First, the attitude matrix of L₁ at the initialization position was constructed as a quaternion denoted as Q_LTPNorth, while the attitude matrix of the visual-inertial fusion method at the initialization position was constructed as a quaternion denoted as Q_MARInit. Assuming the rotational transformation between the quaternions Q_LTPNorth and Q_MARInit is represented by q_LTPToM, the attitude matrix in the AR world coordinate system in real-time for the MAR device was constructed as a quaternion, denoted as Q_MARDevice. The rotational transformation between the quaternions Q_MARInit and Q_MARDevice is represented by q_MToD. Therefore, the relative rotational transformation between the quaternions Q_LTPNorth and Q_MARDevice is represented by q_DToLTP, which is equivalent to the inverse of q_DToLTP × q_MToD. By applying the Euclidean group transformation to these two quaternions, the final quaternion representing the six degrees of freedom in the MAR device world coordinate system was obtained. Specifically, the quaternions Q_LTPNorth, Q_MARInit, and Q_MARDevice are defined as shown in the formula:

\begin{matrix} \begin{matrix} Q_{LTPNorth} = a_{1} + b_{1} i + c_{1} j + d_{1} k \\ Q_{MARInit} = a_{2} + b_{2} i + c_{2} j + d_{2} k \\ Q_{MARDevice} = a_{3} + b_{3} i + c_{3} j + d_{3} k \end{matrix} \end{matrix}

(13)

The method for calculating the rotational transformation q_LTPToM between the quaternions Q_MARInit and Q_LTPNorth is as follows:

\begin{matrix} \begin{matrix} q_{LTPToM} = (a_{2} a_{1} + b_{2} b_{1} + c_{2} c_{1} + d_{2} d_{1}) + (a_{2} b_{1} - b_{2} a_{1} - c_{2} d_{1} + d_{2} c_{1}) i + \\ (a_{2} c_{1} + b_{2} d_{1} - c_{2} a_{1} - d_{2} b_{1}) j + (a_{2} d_{1} - b_{2} c_{1} + c_{2} b_{1} - d_{2} a_{1}) k \end{matrix} \end{matrix}

(14)

The formula for calculating the rotational transformation q_MToD between the quaternions Q_MARDevice and Q_MARInit is as follows:

\begin{matrix} \begin{matrix} q_{MToD} = (a_{3} a_{2} + b_{3} b_{2} + c_{3} c_{2} + d_{3} d_{2}) + (a_{3} b_{2} - b_{3} a_{2} - c_{3} d_{2} + d_{3} c_{2}) i + \\ (a_{3} c_{2} + b_{3} d_{2} - c_{3} a_{2} - d_{3} b_{2}) j + (a_{3} d_{2} - b_{3} c_{2} + c_{3} b_{2} - d_{3} a_{2}) k \end{matrix} \end{matrix}

(15)

The formula for calculating the rotational transformation between the quaternions Q_MARDevice and Q_LTPNorth, specifically the inverse of Q_MARInit under the rotational transformation q_LTPToM, is as follows:

\begin{matrix} \begin{matrix} \begin{matrix} q_{DToLTP} = q_{MToD} \times q_{LTPToM}^{- 1} \\ q_{LTPToM}^{- 1} = \frac{q_{LTPToM}^{*}}{| q_{LTPToM} |^{2}} \end{matrix} \end{matrix} \end{matrix}

(16)

where

q_{LTPToM}^{- 1}

denotes the inverse of the quaternion

q_{LTPToM}

, and

q_{LTPToM}^{*}

and

| q_{LTPToM} |^{2}

represent the conjugate and norm (magnitude) of the quaternion

q_{LTPToM}

, respectively [45]. The calculations above solve the rotational transformation that aligns the pose of the AR world coordinate system with the pose of the AR map’s georeferenced coordinate system. This method ensures a perfect overlap between the real-time pose of the AR device’s scene display and the pose of the spatial data in the AR map.

In summary, we have established an accurate mapping relationship between the georeferenced coordinate system of the AR map and the AR world coordinate system by projecting the geographic coordinate system onto a planar Cartesian coordinate system. This was achieved through the computation of the rotation-invariant matrix using the quaternion representation of the north direction and the calculation of the invariant translation matrix using the visual-inertial fusion method. As a result, after the initialization of the AR device, the virtual world of the AR map and the real world in the AR scene were precisely aligned in terms of coordinate system units, orientation, and axial displacement. Consequently, various spatial objects in the AR map can be accurately displayed and aligned with the real environment after a successful initialization.

3.5.2. Transformation from the AR World Coordinate System to the AR Camera Coordinate System

The above process involves the conversion of coordinates from the geographical coordinate system to the projected plane coordinate system, followed by the transformation to the LTP coordinate system, and finally to the AR world coordinate system. This process establishes the mapping relationship between coordinates in the geographical coordinate system and the physical world in the MAR device. However, in order to correctly render spatial objects in the AR map from the camera perspective, it was necessary to further transform the AR world coordinate system to the AR camera coordinate system. This section will describe how to achieve this transformation.

The AR world coordinate system was used to describe the camera’s coordinates in the real physical world, with the AR camera serving as the origin at initialization. This origin can represent any object, and the unit of measurement is typically in meters. On the other hand, the AR camera coordinate system was used to describe coordinates in the pinhole model, with the camera optical center as the origin. The z-axis coincides with the optical axis, pointing towards the front of the camera, while the positive directions of the x-axis and y-axis are parallel to the AR world coordinate system. The unit of measurement is also typically in meters.

The transformation between these two coordinate systems is a rigid transformation, meaning that objects do not undergo deformation, but only their position and orientation change. This transformation can be represented by translation and rotation transformations of the coordinate systems. Let the AR world coordinate system be denoted as

[\begin{matrix} X_{a c} \\ Y_{a c} \\ Z_{a c} \end{matrix}]

, and the AR camera coordinate system be denoted as

[\begin{matrix} X_{a w} \\ Y_{a w} \\ Z_{a w} \end{matrix}]

. Then, the transformation between the two can be represented as follows:

[\begin{matrix} X_{a c} \\ Y_{a c} \\ Z_{a c} \end{matrix}] = R [\begin{matrix} X_{a w} \\ Y_{a w} \\ Z_{a w} \end{matrix}] + T

(17)

In the transformation Equation (17), R represents the rotation matrix obtained by rotating the AR world coordinate system in different angles around different axes, and T represents the position offset vector based on the rotation. This section focuses on the calculation of R and T for the transformation from the AR world coordinate system to the AR camera coordinate system.

We obtained the initial pose of the visual-inertial fusion method, which was based on the highly accurate intrinsic and extrinsic parameters obtained through camera calibration of the MAR device [46]. Furthermore, the visual-inertial fusion method continuously estimates optimized extrinsic parameters during the motion process. To evaluate the accuracy of the yaw angle calculation by the visual-inertial fusion sensor fusion and the stability of the attitude over time (using the yaw angle as an example), a test was conducted near a large building in the experimental area. The test involved three rounds of circumnavigating the building, starting and ending at the same location, while observing the device’s attitude during the test. In the experiment, the visual sensor and inertial sensor were fixed on a bracket to record the continuity of the yaw angle values during walking and the yaw angle values at each turning point in the three rounds of testing. The duration of each measurement was 70 min. As shown in Figure 6, under walking conditions with a handheld MAR device, the visual-inertial fusion method demonstrated good continuity in calculating the yaw angle. Walking along the building for 150 m, the changes in the yaw angle values at each turning point were relatively smooth.

The pose data during the motion of the MAR device were represented as P =

[x, y, z, \emptyset, θ, φ]

, where [x, y, z] denotes the position in the AR world coordinate system, and

\emptyset

,

θ

, and

φ

represent the rotations of the coordinate system using Euler angles. Here,

φ

represents the rotation around the X-axis,

θ

represents the rotation around the Y-axis, and

\emptyset

represents the rotation around the Z-axis. We obtained the rotation matrix from the attitude data of P. The rotation transformation matrix, denoted as

R_{realtimegn}

, was calculated between the AR world coordinate system and the AR camera coordinate system, as outlined in Equation (10). As stated in Equation (17), the camera motion of the MAR device is a rigid-body motion, where the length and angles of a vector remain unchanged in different coordinate systems. To accurately describe the transformation relationship between the AR world coordinate system and the AR camera coordinate system, both the rotation matrix R and the translation matrix T need to be used. Therefore, in addition to the rotation-invariant matrix for transforming the AR world coordinate system to the AR camera coordinate system, it was also necessary to determine the offset vector and the invariant translation matrix that describe the coordinate changes during the motion of the MAR device. Furthermore, to eliminate the influence of scale factors, we also transformed the translation matrix to match the real physical scale.

Existing works primarily utilize hardware such as RTK-GPS and IMU to extract the coordinate variations of the MAR device’s motion as displacement between the AR world coordinate system and the AR camera coordinate system [47,48]. These methods are generally applicable in small-scale scenarios. However, when applying AR mapping on a large scale over complex terrain surfaces, the accuracy of displacement measurements is compromised by various interferences, such as ground reflections affecting the variables of any single sensor, cumulative errors in the IMU due to rotation, and the impact of battery heating on the update frequency of the tracker for continuous pose estimation.

In this paper, a novel method is proposed to compute the offset vector T between the AR world coordinate system and the AR camera coordinate system during the motion of the MAR device. Our method leverages high-precision RTK-GPS measurements and employs geodesic algorithms on the Earth’s surface to transform the complex transformations in three-dimensional, rigid-body motion into a measurement problem in the real physical scale. Consequently, the offset vector of the rigid body transformation can be determined.

The coordinates obtained from RTK-GPS are in latitude and longitude units, with the reference plane of the coordinate system being the ellipsoidal surface. On the other hand, the AR local reference coordinate system constructed using the visual-inertial fusion method for the MAR device operates in meters as the unit of the actual physical length. Therefore, to perform the transformation from the world coordinate system to the camera coordinate system, it was necessary to establish the conversion relationship between the RTK-GPS coordinates and the AR camera coordinates. This involved converting the RTK-GPS coordinates in the geographic coordinate system to the AR coordinates in the AR local reference coordinate system.

Let the RTK-GPS coordinates of the MAR device during initialization be denoted as (X₀, Y₀, Z₀), and the real-time coordinates during motion as (X_r, Y_r, Z_r). The calculation formula for the translation matrix between the AR world coordinate system and the AR camera coordinate system for this device is as follows:

T = [\begin{matrix} Δ longitude \\ Δ latitude \\ Δ altitude \end{matrix}] = [\begin{matrix} Long_CC \times (X_{r} - X_{0}) \\ Lat_CC \times (Y_{r} - Y_{0}) \\ (Z_{r} - Z_{0}) \end{matrix}]

(18)

where T represents the transformation matrix that accounts for the offset between the AR world coordinate system and the AR camera coordinate system. It was used to describe the real-time mapping relationship between the RTK-GPS coordinates and the AR coordinates in the actual surface environment.

Δ longitude

,

Δ latitude

, and

Δ altitude

represent the variations in longitude, latitude, and altitude between the current MAR device’s valid RTK-GPS coordinates and the coordinates during the initialization phase. Long_CC and Lat_CC represent the unit conversion coefficients, expressed in meters/degree, which were obtained through the Vincenty formula-based geodesic method [49] to calculate the fixed length of longitude and latitude in the target research area [50]. By incorporating T into the transformation method described in Equation (17), the coordinates (X_aw, Y_aw, Z_aw) of the MAR device during motion can be calculated in the AR world coordinate system. The mapping relationship represented by T can be adaptively adjusted according to the longitude and latitude positions in different geographical regions on Earth.

4. Experiments

4.1. Experimental Platform

A geo-registration system was constructed for the experiment using a series of low-cost devices. Table 1 shows the detailed parameters of the system. The smartphone was equipped with SLAM, IMU, and RTK-GPS capabilities, with a total cost of no more than USD 500.

To assess the performance differences between this method and traditional methods in AR geo-registration, the evaluation of the proposed method was conducted from several perspectives. Firstly, the results of location registration in AR geographic registration were analyzed to demonstrate the effectiveness of the method in various practical application scenarios, such as virtual building previews and trajectory collection. Secondly, the surface alignment results in AR geographic registration were compared with several popular methods. Lastly, the stability of AR map motion tracking was demonstrated through a series of newly conducted outdoor scene tests.

As shown in Table 2, a comprehensive park-level map dataset was selected for performance testing to simulate typical user interactions with AR-GIS applications. The testing assumed that AR-GIS applications exhibit typical user behavior. The entire testing process was conducted on real park scenes using handheld smartphones.

4.2. Testing of Positional Alignment Results in AR Geo-Registration

For the comparison of building geographic north alignment, the experiment conducted tests comparing the visual tracker-based method with the proposed method. In order to obtain quantitative results, two different feature angles (building front and building side) were selected during the initialization and motion processes to establish visual reference points. To reflect real-world AR map applications, the selected visual reference points were at distances greater than 30 m from the virtual buildings. The average difference between the position and orientation reported by the pose estimation system based on this method and the observed pose information, measured as the average difference between the back-projected image position of the AR camera and the observed virtual building position, was used as a metric for localization accuracy in each frame. Visual rendering of the designated buildings was performed for each selected frame. The average difference between the back-projected image position and the observed (visually tracked) feature position was used as a measure of tracking accuracy in each frame.

Figure 7 illustrates the geo-registration results obtained using the two methods. In the approach solely based on visual-inertial fusion, the average error between the observed feature positions and the actual buildings in the video background reached 13.1 pixels. The red area in Figure 7a represents the offset of 3D virtual buildings relative to the real buildings. Conversely, with the proposed method, the average error between the observed feature positions and the actual buildings in the video background was reduced to 2.5 pixels. This level of precision is nearly imperceptible to the human eye, highlighting its high practical value.

To compare the trajectory acquisition accuracy of different methods, the experiment conducted a comparative test between the method based on visual-inertial fusion, the RTK-GPS+IMU method, and the proposed method. To verify the effectiveness of different methods in various typical AR-GIS application scenarios, the experiment selected a use case involving tracking arbitrary geographical objects and compared the performance for four different mobile distance ranges: 10 m, 50 m, 100 m, and 150 m. Each test scenario was repeated five times, and the average values were calculated. Table 3 presents the quantitative comparison results of the three methods for different distance ranges. From Table 3, it is evident that the proposed method exhibited significantly lower errors in AR geolocation compared to the other methods.

Due to the initial heading angle error [54] and the cumulative error of the visual-inertial fusion method, the RTK-GPS visual-inertial fusion method exhibited significant errors in practical applications, making it difficult to be directly applied. In this paper, an improved approach was proposed to address this issue. The RTK-GPS+IMU method achieves accurate pose estimation by fusing high-precision localization and sensor data, enabling automatic correction during motion, and is well-suited for long-term geo-registration scenarios. On the other hand, the visual-inertial fusion method provides high-frequency and high-precision pose estimation, making it a widely adopted solution in recent AR map applications. The comparative analysis based on the results presented in Table 3 can be summarized as follows.

Compared to the hybrid method based on visual-inertial fusion, our proposed method exhibited a greater reduction in error as the distance increased. When the distance exceeded 150 m, our method achieved nearly a five-fold reduction in error compared to the visual-inertial fusion method. This is attributed to the larger error in the orientation data obtained from the IMU sensor of the mobile device compared to the high-precision RTK-GPS multi-point orientation method. As a result, there was a significant error when converting AR map data from the geographical coordinate system to the AR world coordinate system. Furthermore, the cumulative error of the visual-inertial fusion method increased with distance, leading to larger coordinate errors in AR geolocation for greater distances. Conversely, our method, which combines rotation-invariant matrices with RTK-GPS, provided high-precision and real-time-corrected positioning information.

Compared to the multi-sensor method based on RTK-GPS+IMU, our proposed method exhibited significantly lower errors across different distance ranges, with an average reduction in error of approximately four-fold. This improvement was achieved by incorporating ARCore [55], a visual-inertial fusion method, which provides high-precision pose yaw angles. These yaw angles were then fused with high-precision RTK-GPS measurements. Consequently, during the conversion from the geographical coordinate system to the AR world coordinate system, the coordinate accuracy of AR geolocation was greatly enhanced. In addition, obtaining the true-north direction estimation of MAR devices based on IMU is limited by hardware, resulting in significant initialization geographic direction errors. On the contrary, our method combined RTK-GPS with visual-inertial fusion methods to provide high-precision true-north direction estimation information on the same low-cost MAR device.

For short distances (within 10 m), the geolocation accuracy of our method was slightly lower than that of the visual-inertial fusion method, with an error of 5.1 cm. This is due to the fact that the visual-inertial fusion method exhibits minimal cumulative errors and achieves a high relative accuracy at the centimeter level in small-scale scenarios [56]. In order to obtain high-precision geolocation and orientation, our method also incorporated RTK-GPS in local small-scale areas. However, due to the instability of RTK-GPS during the initialization process, the coordinate accuracy of AR geolocation within the 10 m range was slightly lower than that of the visual-inertial fusion method.

4.3. Testing of Surface Registration Results in AR Geo-Registration

For the comparative experiment of surface plane alignment, we selected the robust and high-precision RTK-GPS+IMU multi-sensor method. The experiment verified the angle between the normal vectors of the gravity vectors estimated by these two methods on the same plane. To obtain quantitative results, the MAR device was handheld at the same reference position, and experiments were conducted separately for the estimation of surface gravity vectors using the two aforementioned methods. The angles (in degrees) between the surface normal vectors reported by the MAR device after the successful construction of gravity vectors and the reference normal vectors were recorded. The experiments were repeated 50 times at the selected reference position, and the results of the estimations were collected. Finally, the average angle and standard deviation were calculated based on the 50 successful constructions of normal vectors.

The reference normal vectors in Table 4 were obtained using the gravity meter method [57], which provides high sensitivity and stability. During measurements, precautions were taken to avoid exposure to vibrations or other sources of interference, thereby excluding the influence of external factors (such as geological variations or changes in elevation). To obtain the normal vectors using the RTK-GPS+IMU method (default experimental surface normal vectors are aligned with the gravity vector, as mentioned in the Conclusion Section for specific cases), the following steps were performed. Firstly, the RTK-GPS provided positioning and velocity measurements for the MAR device, while the IMU provided high-precision measurements of acceleration, angular velocity, and other attitude measurements. Next, the relative measurements from the IMU were combined with the absolute motion information from RTK-GPS to estimate the raw acceleration of the MAR device. Subsequently, the estimated raw acceleration was separated from the actual acceleration of the MAR device, retaining only the pure gravity acceleration. Finally, the gravitational acceleration of the MAR device was transformed to align the z-axis with the Earth’s center, while the x- and y-axes corresponded to the longitude and latitude directions, respectively.

As shown in Table 4, the estimated results of surface normal vectors were obtained using the two methods, demonstrating that the proposed method achieved higher accuracy compared to the RTK-GPS+IMU method. The proposed method exhibited a closer approximation to the reference normal vectors in estimating the surface normal vectors and yielded more concentrated results in each estimation. These findings demonstrated that the proposed method achieved higher precision in aligning the surface normal vectors for AR geo-registration. The average precision of gravity vector estimation using RTK-GPS and IMU was 4.722 degrees. The relatively poor stability of IMU’s acceleration and angular velocity measurements when estimating attitude can lead to significant errors when fused with absolute motion information from RTK-GPS to calculate gravity acceleration. The method had an average variance of 2.314, which is not significantly large. This is because RTK-GPS provides highly accurate absolute positioning and velocity information, and the Kalman filter allows for real-time alignment and fusion of IMU measurements. The RTK-GPS+IMU method reliably calculated the variance of the gravity vector. The proposed approach in this paper combines visual-inertial tracking and RTK-GPS using the rotational invariance matrix, resulting in an average precision of the gravity vector that was nearly 2.5 times better than the previous method.

To compare the surface alignment accuracy achieved by constructing invariant translation matrices based on RANSAC and RTK-GPS, experimental comparative tests were conducted between the RTK-GPS+IMU method and the proposed method. Figure 8 illustrates the elevation estimation results for the RTK-GPS+IMU method (left) and the proposed method (right). Compared to Figure 8a, the position indicated by the red arrow in Figure 8b is significantly off the ground surface. To obtain these quantitative results, the same experimental procedure as described in Figure 7 was employed. After successful initialization of the MAR device, the elevation estimation results were recorded while the MAR device was moved towards the buildings for a continuous duration of 2 min. Subsequently, the AR visualization results of virtual 3D buildings obtained by both methods were compared after acquiring the elevation information.

Table 5 presents the experimental results for evaluating the accuracy of surface alignment after MAR device georeferencing. During the testing process, the camera of the Mi 9 smartphone was fixed on a mount, and measurements of the surface alignment error were conducted every 50 m, starting from the origin. Each measurement lasted for 10 min, and the alignment results were assessed. As shown in Table 5, the proposed method achieved an average alignment error of 3.1 cm in aligning virtual 3D buildings with the surface, significantly smaller than the average error of 14.9 cm obtained by the RTK-GPS+IMU method. The alignment error of the proposed method accumulated with the increasing movement distance, but it was promptly corrected once it exceeded the threshold set by the RTK-GPS.

4.4. Stability Testing of Motion Tracking in AR Geo-Registration

Figure 9 demonstrates that the proposed method based on visual-inertial fusion can accurately localize GIS virtual objects. Regardless of the orientation of the virtual object or the distance between the virtual object and the visual sensor of the mobile AR device, the virtual object consistently aligned with its corresponding position in the real world. This provides strong evidence for the advantages of the proposed method in terms of frequency, accuracy, and stability in AR geolocation. In contrast, the registration results based on the RTK-GPS+IMU method showed significant deviations of virtual objects at different locations. For instance, the positioning results of the same virtual object in distant locations (Figure 9c) and nearby locations (Figure 9a) did not align with the same position in the real scene. Additionally, the directional rendering of virtual objects from different perspectives is inconsistent in the real world. These issues can cause visual cognitive difficulties in applications, making it challenging for navigation software to provide accurate guidance routes.

The experiment also included a comparative test of real-time motion direction and attitude fusion results. The camera of the MAR device was fixed on a bracket to eliminate human errors and improve test repeatability. The precision of motion direction and attitude was measured by launching the MAR device from a fixed direction at a reference point and continuously testing the device without resetting the visual-inertial fusion method and RTK-GPS. The test started from the origin, traversed a straight road, then a circular road segment, and finally, another straight road. Both methods employ filters to smooth the recorded results and downsample the attitude results for deviation comparison.

Figure 10 presents the fusion results of motion direction and attitude at the same location during real-time motion using the handheld MAR device. The experiment utilized external sensor-based ground truth, specifically the Microsoft Azure Kinect 3 depth camera. The Azure Kinect 3 provides accurate yaw angle information during real-time walking motion, which was used as the reference ground truth. The proposed method exhibited closer proximity to the reference ground truth compared to the RTK-GPS+IMU method. As the continuous motion distance increased, the fusion results of motion direction and attitude maintained their stability. Although there was an initial error (1.9 degrees) in the attitude estimation during initialization, it was subsequently corrected. Cumulative errors occurred with the increasing distance, but they could be corrected once the motion direction and attitude thresholds were triggered. In contrast, the RTK-GPS+IMU method exhibited large errors in the attitude estimation during initialization (4.3 degrees), and the errors did not converge during subsequent motion. Although the overall motion direction and attitude were close to the reference ground truth, the average error of 5.5 degrees made it challenging to meet the accuracy requirements of AR mapping applications in large-scale geographical environments.

The total testing time for the proposed method, which includes AR geolocation computation and visual rendering for each frame of the image, was approximately 30 ms. The visual rendering part was integrated and computed on the GPU, taking approximately 6 ms, while the remaining computations were performed on the CPU (Snapdragon 870 processor), accounting for approximately 24 ms. However, when the data load of the AR map is large (e.g., individual model files exceeding 300 MB), the total testing time per frame will significantly increase. With the rapid development of GPU capabilities on smart mobile devices, future research can explore completing all computations for AR geolocation on the GPU, thereby further improving the performance of AR geolocation. This work holds great potential for applications in three-dimensional reconstruction of large-scale spatial data, urban design, real-time navigation, and other AR applications.

5. Conclusions

This paper presented a multi-sensor AR geolocation method based on rotation-invariant matrices, achieving high-precision AR geo-registration using low-cost hardware. By leveraging affordable devices such as smartphones, the proposed method offers a cost-effective and widely applicable solution, with minimal sensor hardware quality and calibration requirements. This innovation not only enhances the feasibility of AR geo-registration technology, but also establishes a foundation for broader applications. The method utilizes rotation-invariant matrices to fuse high-precision RTK-GPS and visual-inertial fusion methods for heading angle information, achieving high-accuracy and real-time AR geolocation. By conducting experiments at different distance ranges, the effectiveness and superiority of the proposed method were validated. The experimental results demonstrated that the proposed method exhibited lower coordinate errors compared to hybrid methods based on visual-inertial fusion and multi-sensor methods based on RTK-GPS+IMU, with an average error reduction of approximately four times. The proposed method achieved a nearly five-fold error reduction at long distances (greater than 150 m). Although the proposed method slightly increased the error within short distances (within 10 m) compared to visual-inertial fusion methods, it still maintained a high level of precision. The proposed method fully leverages the advantages of multi-sensor integration, improving the accuracy and stability of AR geolocation.

Despite the favorable performance of the proposed method, there are still some limitations that need to be addressed and improved in future work. These limitations include:

The proposed method exhibited slightly higher errors within short distances (within 10 m) compared to visual-inertial fusion methods, mainly due to the lower accuracy of RTK-GPS within a small range. Therefore, a more flexible sensor-switching mechanism should be designed to select the most suitable sensor combination for different distance ranges, aiming to achieve optimal AR geolocation results.
In complex and varying outdoor terrain scenarios, high-precision and markerless pose tracking remains challenging. Employing recursive filtering methods may improve the final orientation accuracy of rotation-invariant matrices and enable dynamic calibration during motion. Combining filtering methods with smoother coordinate transformation techniques can enhance the visual quality of AR geolocation results, yielding smoother visualization in AR geolocation alignment. Additionally, the proposed method in this paper primarily focuses on AR application scenarios, where the gravity direction is perpendicular to the ground. In future research, ongoing efforts will be made to further improve this situation.

Author Contributions

Conceptualization, K.H. and C.W.; data curation, K.H.; formal analysis, K.H.; funding acquisition, W.S.; investigation, C.W.; methodology, K.H. and C.W.; project administration, K.H.; resources, W.S.; supervision, C.W. and W.S.; validation, K.H.; visualization, K.H.; writing—original draft, C.W.; writing—review and editing, K.H., C.W., and W.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Key Research and Development Program of China (No. 2022YFF1301101).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors express thanks to the anonymous reviewers for their constructive comments and advice.

Conflicts of Interest

The authors declare no conflict of interest.

References

Cheng, Y.; Zhu, G.; Yang, C.; Miao, G.; Ge, W. Characteristics of augmented map research from a cartographic perspective. Cartogr. Geogr. Inf. Sci. 2022, 49, 426–442. [Google Scholar] [CrossRef]
Behzadan, A.H.; Kamat, V.R. Georeferenced Registration of Construction Graphics in Mobile Outdoor Augmented Reality. J. Comput. Civ. Eng. 2007, 21, 247–258. [Google Scholar] [CrossRef]
Ren, X.; Sun, M.; Jiang, C.; Liu, L.; Huang, W. An Augmented Reality Geo-Registration Method for Ground Target Localization from a Low-Cost UAV Platform. Sensors 2018, 18, 3739. [Google Scholar] [CrossRef] [Green Version]
Liu, D.; Chen, J.; Hu, D.; Zhang, Z. Dynamic BIM-augmented UAV safety inspection for water diversion project. Comput. Ind. 2019, 108, 163–177. [Google Scholar] [CrossRef]
Portalés, C.; Lerma, J.L.; Navarro, S. Augmented reality and photogrammetry: A synergy to visualize physical and virtual city environments. ISPRS J. Photogramm. Remote Sens. 2010, 65, 134–142. [Google Scholar] [CrossRef]
Xiao, W.; Mills, J.; Guidi, G.; Rodríguez-Gonzálvez, P.; Gonizzi Barsanti, S.; González-Aguilera, D. Geoinformatics for the conservation and promotion of cultural heritage in support of the UN Sustainable Development Goals. ISPRS J. Photogramm. Remote Sens. 2018, 142, 389–406. [Google Scholar] [CrossRef]
Ma, X.; Sun, J.; Zhang, G.; Ma, M.; Gong, J. Enhanced Expression and Interaction of Paper Tourism Maps Based on Augmented Reality for Emergency Response. In Proceedings of the 2018 2nd International Conference on Big Data and Internet of Things—BDIOT 2018, Beijing, China, 24–26 October 2018; ACM Press: New York, NY, USA, 2018; pp. 105–109. [Google Scholar]
Gazcón, N.F.; Trippel Nagel, J.M.; Bjerg, E.A.; Castro, S.M. Fieldwork in Geosciences assisted by ARGeo: A mobile Augmented Reality system. Comput. Geosci. 2018, 121, 30–38. [Google Scholar] [CrossRef]
Li, W.; Han, Y.; Liu, Y.; Zhu, C.; Ren, Y.; Wang, Y.; Chen, G. Real-time location-based rendering of urban underground pipelines. ISPRS Int. J. Geo-Inf. 2018, 7, 32. [Google Scholar] [CrossRef] [Green Version]
Suh, J.; Lee, S.; Choi, Y. UMineAR: Mobile-tablet-based abandoned mine hazard site investigation support system using augmented reality. Minerals 2017, 7, 198. [Google Scholar] [CrossRef] [Green Version]
Huang, K.; Wang, C.; Wang, S.; Liu, R.; Chen, G.; Li, X. An Efficient, Platform-Independent Map Rendering Framework for Mobile Augmented Reality. ISPRS Int. J. Geo-Inf. 2021, 10, 593. [Google Scholar] [CrossRef]
Li, P.; Qin, T.; Hu, B.; Zhu, F.; Shen, S. Monocular Visual-Inertial State Estimation for Mobile Augmented Reality. In Proceedings of the 2017 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Nantes, France, 9–13 October 2017; pp. 11–21. [Google Scholar]
Von Stumberg, L.; Usenko, V.; Cremers, D. Direct Sparse Visual-Inertial Odometry Using Dynamic Marginalization. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; pp. 2510–2517. [Google Scholar]
Trimpe, S.; D’Andrea, R. Accelerometer-based tilt estimation of a rigid body with only rotational degrees of freedom. In Proceedings of the 2010 IEEE International Conference on Robotics and Automation, Anchorage, AK, USA, 3–7 May 2010; pp. 2630–2636. [Google Scholar]
Zhang, Z.-Q.; Yang, G.-Z. Calibration of Miniature Inertial and Magnetic Sensor Units for Robust Attitude Estimation. IEEE Trans. Instrum. Meas. 2014, 63, 711–718. [Google Scholar] [CrossRef] [Green Version]
Tedaldi, D.; Pretto, A.; Menegatti, E. A robust and easy to implement method for IMU calibration without external equipments. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May–7 June 2014; pp. 3042–3049. [Google Scholar]
Thong, Y.K.; Woolfson, M.S.; Crowe, J.A.; Hayes-Gill, B.R.; Challis, R.E. Dependence of inertial measurements of distance on accelerometer noise. Meas. Sci. Technol. 2002, 13, 1163–1172. [Google Scholar] [CrossRef] [Green Version]
Ryohei, H.; Michael, C. Outdoor Navigation System by AR. SHS Web Conf. 2021, 102, 04002. [Google Scholar] [CrossRef]
Wang, Y.J.; Gao, J.Q.; Li, M.H.; Shen, Y.; Hasanyan, D.; Li, J.F.; Viehland, D. A review on equivalent magnetic noise of magnetoelectric laminate sensors. Philos. Trans. R. Soc. Math. Phys. Eng. Sci. 2014, 372, 20120455. [Google Scholar] [CrossRef] [Green Version]
Morales, Y.; Tsubouchi, T. DGPS, RTK-GPS and StarFire DGPS Performance Under Tree Shading Environments. In Proceedings of the 2007 IEEE International Conference on Integration Technology, Shenzhen, China, 20–24 March 2007; pp. 519–524. [Google Scholar]
Kim, M.G.; Park, J.K. Accuracy Evaluation of Internet RTK GPS by Satellite Signal Reception Environment. J. Korean Soc. Surv. Geod. Photogramm. Cartogr. 2013, 31, 277–283. [Google Scholar] [CrossRef]
Burkard, S.; Fuchs-Kittowski, F. User-Aided Global Registration Method using Geospatial 3D Data for Large-Scale Mobile Outdoor Augmented Reality. In Proceedings of the 2020 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), Recife, Brazil, 9–13 November 2020; pp. 104–109. [Google Scholar]
Randeniya, D.I.B.; Sarkar, S.; Gunaratne, M. Vision–IMU Integration Using a Slow-Frame-Rate Monocular Vision System in an Actual Roadway Setting. IEEE Trans. Intell. Transp. Syst. 2010, 11, 256–266. [Google Scholar] [CrossRef]
Suwandi, B.; Kitasuka, T.; Aritsugi, M. Low-cost IMU and GPS fusion strategy for apron vehicle positioning. In Proceedings of the TENCON 2017—2017 IEEE Region 10 Conference, Penang, Malaysia, 5–8 November 2017; pp. 449–454. [Google Scholar]
Wang, S.; Deng, Z.; Yin, G. An Accurate GPS-IMU/DR Data Fusion Method for Driverless Car Based on a Set of Predictive Models and Grid Constraints. Sensors 2016, 16, 280. [Google Scholar] [CrossRef] [Green Version]
Mahdi, A.E.; Azouz, A.; Abdalla, A.; Abosekeen, A. IMU-Error Estimation and Cancellation Using ANFIS for Improved UAV Navigation. In Proceedings of the 2022 13th International Conference on Electrical Engineering (ICEENG), Cairo, Egypt, 29–31 March 2022; pp. 120–124. [Google Scholar]
Huang, W.; Sun, M.; Li, S. A 3D GIS-based interactive registration mechanism for outdoor augmented reality system. Expert Syst. Appl. 2016, 55, 48–58. [Google Scholar] [CrossRef]
Qimin, X.; Bin, C.; Xu, L.; Xixiang, L.; Yuan, T. Vision-IMU Integrated Vehicle Pose Estimation based on Hybrid Multi-Feature Deep Neural Network and Federated Filter. In Proceedings of the 2021 28th Saint Petersburg International Conference on Integrated Navigation Systems (ICINS), Saint Petersburg, Russia, 31 May–2 June 2021; pp. 1–5. [Google Scholar]
Liu, R.; Zhang, J.; Chen, S.; Yang, T.; Arth, C. Accurate real-time visual SLAM combining building models and GPS for mobile robot. J. Real-Time Image Process. 2021, 18, 419–429. [Google Scholar] [CrossRef]
Toker, A.; Zhou, Q.; Maximov, M.; Leal-Taix’e, L. Coming Down to Earth: Satellite-to-Street View Synthesis for Geo-Localization. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 6484–6493. [Google Scholar]
Mithun, N.C.; Minhas, K.S.; Chiu, H.-P.; Oskiper, T.; Sizintsev, M.; Samarasekera, S.; Kumar, R. Cross-View Visual Geo-Localization for Outdoor Augmented Reality. In Proceedings of the 2023 IEEE Conference Virtual Reality and 3D User Interfaces (VR), Shanghai, China, 25–29 March 2023; pp. 493–502. [Google Scholar]
Ventura, J.; Höllerer, T. Wide-area scene mapping for mobile visual tracking. In Proceedings of the 2012 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Atlanta, GA, USA, 5–8 November 2012; pp. 3–12. [Google Scholar]
Qin, T.; Cao, S.; Pan, J.; Shen, S. A General Optimization-based Framework for Global Pose Estimation with Multiple Sensors 2019. arXiv 2019, arXiv:1901.03642. [Google Scholar] [CrossRef]
Qu, X.; Soheilian, B.; Habets, E.; Paparoditis, N. Evaluation of sift and surf for vision based localization. ISPRS Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2016, XLI-B3-685, 685–692. [Google Scholar] [CrossRef] [Green Version]
Wan, G.; Yang, X.; Cai, R.; Li, H.; Zhou, Y.; Wang, H.; Song, S. Robust and Precise Vehicle Localization Based on Multi-Sensor Fusion in Diverse City Scenes. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; pp. 4670–4677. [Google Scholar]
Hesch, J.A.; Kottas, D.G.; Bowman, S.L.; Roumeliotis, S.I. Consistency Analysis and Improvement of Vision-aided Inertial Navigation. IEEE Trans. Robot. 2014, 30, 158–176. [Google Scholar] [CrossRef] [Green Version]
Corke, P.; Lobo, J.; Dias, J. An Introduction to Inertial and Visual Sensing. Int. J. Robot. Res. 2007, 26, 519–535. [Google Scholar] [CrossRef]
Foxlin, E.; Naimark, L. VIS-Tracker: A wearable vision-inertial self-tracker. In Proceedings of the IEEE Virtual Reality, 2003, Los Angeles, CA, USA, 22–26 March 2003; pp. 199–206. [Google Scholar]
Schall, G.; Wagner, D.; Reitmayr, G.; Taichmann, E.; Wieser, M.; Schmalstieg, D.; Hofmann-Wellenhof, B. Global pose estimation using multi-sensor fusion for outdoor Augmented Reality. In Proceedings of the 2009 8th IEEE International Symposium on Mixed and Augmented Reality, Orlando, FL, USA, 19–22 October 2009; pp. 153–162. [Google Scholar]
Waegel, K.; Brooks, F.P. Filling the gaps: Hybrid vision and inertial tracking. In Proceedings of the 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Adelaide, SA, Australia, 1–4 October 2013; pp. 1–4. [Google Scholar]
Oskiper, T.; Samarasekera, S.; Kumar, R. Global Heading Estimation For Wide Area Augmented Reality Using Road Semantics For Geo-referencing. In Proceedings of the 2021 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), Bari, Italy, 4–8 October 2021; pp. 427–428. [Google Scholar]
Hansen, L.H.; Fleck, P.; Stranner, M.; Schmalstieg, D.; Arth, C. Augmented Reality for Subsurface Utility Engineering, Revisited. IEEE Trans. Vis. Comput. Graph. 2021, 27, 4119–4128. [Google Scholar] [CrossRef]
Leick, A.; Rapoport, L.; Tatarnikov, D. Geodesy. In GPS satellite surveying; Wiley: Hoboken, NJ, USA, 2015; pp. 129–206. ISBN 978-1-119-01861-2. [Google Scholar]
Chen, D.; Zhang, L.; Li, J.; Liu, R. Urban building roof segmentation from airborne lidar point clouds. Int. J. Remote Sens. 2012, 33, 6497–6515. [Google Scholar] [CrossRef]
Pujol, J. Hamilton, Rodrigues, Gauss, Quaternions, and Rotations: A Historical Reassessment. Commun. Math. Anal. 2012, 13, 1–14. [Google Scholar]
Huang, W.; Wan, W.; Liu, H. Optimization-Based Online Initialization and Calibration of Monocular Visual-Inertial Odometry Considering Spatial-Temporal Constraints. Sensors 2021, 21, 2673. [Google Scholar] [CrossRef]
Lu, R.S.; Li, Y.F. A global calibration method for large-scale multi-sensor visual measurement systems. Sens. Actuators Phys. 2004, 116, 384–393. [Google Scholar] [CrossRef]
Han, T.; Zhou, G. Pseudo-spectrum-based multi-sensor multi-frame detection in mixed coordinates. Digit. Signal Process. 2023, 134, 103931. [Google Scholar] [CrossRef]
Thomas, C.M.; Featherstone, W.E. Validation of Vincenty’s Formulas for the Geodesic Using a New Fourth-Order Extension of Kivioja’s Formula. J. Surv. Eng. 2005, 131, 20–26. [Google Scholar] [CrossRef]
Nowak, E.; Nowak Da Costa, J. Theory, strict formula derivation and algorithm development for the computation of a geodesic polygon area. J. Geod. 2022, 96, 20. [Google Scholar] [CrossRef]
Huang, K.; Wang, C.; Liu, R.; Chen, G. A Fast and Accurate Spatial Target Snapping Method for 3D Scene Modeling and Mapping in Mobile Augmented Reality. ISPRS Int. J. Geo-Inf. 2022, 11, 69. [Google Scholar] [CrossRef]
SDK Downloads|ARCore. Available online: https://developers.google.com/ar/develop/downloads (accessed on 27 May 2023).
Nowacki, P.; Woda, M. Capabilities of ARCore and ARKit Platforms for AR/VR Applications. In Proceedings of the Engineering in Dependability of Computer Systems and Networks, Brunów, Poland, 1–5 July 2019; Zamojski, W., Mazurkiewicz, J., Sugier, J., Walkowiak, T., Kacprzyk, J., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 358–370. [Google Scholar]
Li, H.; Wang, J.; He, S.; Lee, C.-H. Nonlinear Optimal Impact-Angle-Constrained Guidance with Large Initial Heading Error. J. Guid. Control Dyn. 2021, 44, 1663–1676. [Google Scholar] [CrossRef]
Sukhareva, E.; Tomchinskaya, T.; Serov, I. SLAM-based Indoor Navigation in University Buildings. In Proceedings of the 31th International Conference on Computer Graphics and Vision, Nizhny Novgorod, Russia, 27–30 September 2021; Volume 2, pp. 611–617. [Google Scholar]
Servières, M.; Renaudin, V.; Dupuis, A.; Antigny, N. Visual and Visual-Inertial SLAM: State of the Art, Classification, and Experimental Benchmarking. J. Sens. 2021, 2021, 2054828. [Google Scholar] [CrossRef]
Ménoret, V.; Vermeulen, P.; Le Moigne, N.; Bonvalot, S.; Bouyer, P.; Landragin, A.; Desruelle, B. Gravity measurements below 10−9 g with a transportable absolute quantum gravimeter. Sci. Rep. 2018, 8, 12300. [Google Scholar] [CrossRef]

Figure 1. The general process of AR geo-registration.

Figure 2. The architecture of AR geo-registration proposed in this article.

Figure 3. The three coordinate systems involved in this study.

Figure 4. Estimation of the true-north vector based on RTK-GPS and visual-inertial fusion.

Figure 5. The flowchart of the geographic north 0-degree attitude initialization process.

Figure 6. Continuous yaw angle values based on the visual-inertial fusion method during three rounds of circumnavigating a test building.

Figure 7. Comparison of directional registration effects after constructing invariant translation matrices using different methods: (a) visual-inertial fusion method and (b) our method.

Figure 8. Comparison of elevation estimation results for different methods: (a) RTK-GPS+IMU and (b) the proposed method.

Figure 9. Comparison of AR geolocation results for both methods at the same location: (a) 10 m interval distance (RTK-GPS+IMU), (b) 30 m interval distance (RTK-GPS+IMU), (c) 50 m interval distance (RTK-GPS+IMU), (d) 10 m interval distance (our method), (e) 30 m interval distance (our method), and (f) 50 m interval distance (our method).

Figure 10. Comparison of the fused orientation and pose results at the same location during real-time motion for the two methods.

Table 1. The parameters of the geo-registration system.

Parameters	Value
RTK-GPS device	HUACE H20 receiver equipped with the RTK kit
RTK-GPS kit	RTK-GPS single-antenna kit [51]
Smartphone device and OS	Mi 12 (the Android operating system)
Dependencies of geo-registration API	C++ and OpenGL libraries, along with ARCore [52,53]
The magnetic sensor kit	AKM kit (Asahi Kasei Microdevices)
The walking speed during the experiment	Restricted to approximately 0.8–1.2 m per second to conform to typical walking motion patterns
Top-mounted fixed receiver on the gimbal bracket	Fixed on top of a gimbal bracket at a height of 2 m, facing the same direction as the smartphone

Table 2. The parameters of the park-level map dataset.

Parameters	Value
Map data layers	6 vector layers, 3 scene layers, 1 navigation layer,1 POI (Point of Interest) layer, and 1 text annotation layer
Data source of 3D building data	National BIM Library
Data source of the vector road data	OpenStreetMap
Total dataset size	3.7 GB

Table 3. Average error for the three AR geo-registration methods.

AR Geo-Registration Methods	10.0 m	50.0 m	100.0 m	150.0 m
Visual-inertial fusion method	4.7 cm	23.7 cm	1.31 m	2.49 m
RTK-GPS+IMU	27.1 cm	33.0 cm	44.9 cm	52.1 cm
Our method	5.1 cm	8.9 cm	11.3 cm	15.8 cm

Table 4. Test results of the gravity vector estimation results of two methods.

Method	Average Angle (deg)	Standard Deviation
RTK-GPS+IMU	4.722	2.314
Our method	1.901	1.106

Table 5. Test pixel error results of surface alignment.

Distance from the Origin (m)	RTK-GPS+IMU	Our Method
0.0	12.2 cm	3.1 cm
50.0	14.1 cm	2.7 cm
100.0	16.7 cm	3.6 cm
−50.0	14.6 cm	2.5 cm

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huang, K.; Wang, C.; Shi, W. Accurate and Robust Rotation-Invariant Estimation for High-Precision Outdoor AR Geo-Registration. Remote Sens. 2023, 15, 3709. https://doi.org/10.3390/rs15153709

AMA Style

Huang K, Wang C, Shi W. Accurate and Robust Rotation-Invariant Estimation for High-Precision Outdoor AR Geo-Registration. Remote Sensing. 2023; 15(15):3709. https://doi.org/10.3390/rs15153709

Chicago/Turabian Style

Huang, Kejia, Chenliang Wang, and Wenjiao Shi. 2023. "Accurate and Robust Rotation-Invariant Estimation for High-Precision Outdoor AR Geo-Registration" Remote Sensing 15, no. 15: 3709. https://doi.org/10.3390/rs15153709

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Accurate and Robust Rotation-Invariant Estimation for High-Precision Outdoor AR Geo-Registration

Abstract

1. Introduction

2. Related Work

2.1. Sensor-Based Methods

2.2. Vision-Based Methods

2.3. Hybrid Method

3. Methods

3.1. Overview of the Rotation-Invariant Estimation Approach

3.2. Calculation of True-North Direction Vector Based on RTK-GPS and Visual-Inertial Fusion

3.3. Extraction of Gravity Vector by Solving the Ground Surface Plane Equation Using the RANSAC Algorithm

3.4. Attitude Construction and Fusion

3.4.1. Initialization of Quaternion Attitude Based on True-North Vector and Gravity Vector

3.4.2. Continuous Rotation-Invariant Matrix Based on True-North Direction Vector and Gravity Vector

3.5. Coordinate Transformation Based on Rotation-Invariant Matrices

3.5.1. Transformation from AR Geographic Coordinate System to AR World Coordinate System

3.5.2. Transformation from the AR World Coordinate System to the AR Camera Coordinate System

4. Experiments

4.1. Experimental Platform

4.2. Testing of Positional Alignment Results in AR Geo-Registration

4.3. Testing of Surface Registration Results in AR Geo-Registration

4.4. Stability Testing of Motion Tracking in AR Geo-Registration

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI