3.1. Simulation Experiment
For simulation experiments, a moving target detection and tracking platform in an urban environment was built. In the platform, a vehicle runs in a city environment, as shown in
Figure 6, and two UAVs with vertically downward cameras detect the target utilizing Yolov5 [
27], as shown in
Figure 7. The flight altitude of the two UAVs is 100 m, and the baseline between the two UAVs is 70 m. In the process of target sensing, the UAVs first use the target detection algorithm to find the target, and then use the proposed method to continuously geolocate the moving target. We implemented the proposed framework using Python and ran the framework on a machine with Intel(R) an i7-10700 @2.90 GHz CPU and NVIDIA RTX 2070 Super GPU.
Before geolocating the moving target using multiple images, including the past images, the corresponding points in the past images were matched using the corresponding-point-matching method described in
Section 2.3. The accuracy of the corresponding-point-matching method proposed in this paper depends heavily on the estimation of the fundamental matrix
F, and wrong corresponding point matching will cause an error in target geolocation. However, the estimation of the fundamental matrix is occasionally inaccurate due to the matching error of ORB features between the current image and the past image. Therefore, the method of filtering the wrong corresponding points is proposed in
Section 2.3.
In order to evaluate the performance of the method for corresponding point matching and the effectiveness of the method for filtering the wrong corresponding points, we obtained the experimental results shown in
Figure 8. In this experiment, we did not consider the influence of UAV navigation state measurement error on the geolocation results, but only observed the changes caused by the corresponding points. It is worth noting that eight images were used for each geolocation in this experiment. In
Figure 8, the black line is the ground-truth path of the moving target, and the yellow line is the path estimated by the proposed moving-target geolocation framework without filtering the wrong corresponding points. It can be seen that the corresponding point matching introduces errors to the moving-target geolocation, and the geolocation error is large when the car is at the corner of an image, which is due to the small overlap between the current image and the past images, so there are not enough ORB features to estimate an accurate fundamental matrix. However, the geolocation error was greatly reduced after using the proposed method of filtering the wrong matching points, as shown by the red line.
The statistical characteristics of continuous geolocation results are shown in
Table 1, in which we count the mean absolute errors (MAEs), the standard deviations (STDs), and the maximum errors (MAX) of geolocation results in each direction, X, Y, and Z. The mean absolute error (MAE) is calculated as follows:
where
n represents the number of geolocations, and
and
represent the estimated value and the true value, respectively. It can be seen in
Table 1 that the proposed method can effectively filter out the wrong corresponding points and reduce the geolocation errors.
After matching corresponding points, the past images and the current images are used to estimate the target’s altitude
. We obtained the statistical characteristics of altitude estimation in the continuous geolocation process, as shown in
Table 2. We utilized 4 or 8 images to obtain the altitude estimation results, and compared the mean absolute errors, the standard deviations, and the maximum errors of altitude estimation errors when using different numbers of images. In addition, we set
and
. The yaw-angle biases
of the left and right UAVs were set as follows:
where
represents the flight time of UAVs and
was set to 20 in this experiment. It can be seen form the results shown in
Table 2 that the more images used for target altitude estimation, the more accurate the altitude estimation will be. However, using more images means more computing costs, and the average processing speed of moving-target geolocation was 22 FPS when utilizing eight images and the above-mentioned machine in the realistic experiment.
Similarly to the experiments of target altitude estimation, we evaluated the performance of yaw-angle bias estimation. In this experiment, the true values of yaw-angle biases need to be known in order to evaluate our method, but it is difficult to measure the yaw-angle biases of a sensor in actual flight. Therefore, we set the same experimental conditions as the altitude estimation experiment to evaluate the performance of yaw-angle biases’ estimation in the simulation environment. As shown in
Table 3, the same conclusion can be obtained—that is, the proposed method can effectively estimate the yaw-angle biases of UAVs, and the more images used, the more reliable the estimation results will be.
In the proposed framework, the altitude of a moving target and the yaw-angle biases of two UAVs are estimated iteratively until the preset conditions are reached. We preset two conditions for stopping the iteration. One is that the estimated yaw-angle biases of the current iteration are less than 0.1, and the other is that the maximum number of iterations is reached, and we set the maximum number of iterations to 20 in this study.
Table 4 shows the average absolute errors of the estimated values in the iteration process of the proposed framework utilizing eight images under the above-mentioned experimental settings. It can be seen that when the number of iterations is four, the estimated yaw-angle biases and target altitude reach stable values. After that, the results of subsequent iterations fluctuate around the stable values. This is because in the iteration process, there is not only the influence of yaw-angle biases, but also the influence of Gaussian measurement errors. This makes it impossible for the whole system to reach a minimum yaw-angle biases through multiple iterations.
The main factors that affect the geolocation accuracy are the measurement errors of the UAV’s navigation state. Therefore, we evaluated the performance of the proposed framework under different measurement errors of the UAV’s attitude angle and position. As the comparison algorithm, the one-shot method assumes that the distance from UAV to the moving target is known (e.g., provided by laser rangefinder) and utilizes only one image to estimate the three-dimensional coordinates of the moving target. This approach assumes that the UAV navigation state is provided by an accurate AHRS, so it does not consider the influence of the UAV navigation state measurement errors on the geolocation result. We first assume that only the attitude angle has Gaussian measurement error in the UAV’s navigation state, and then compares the performance of the proposed framework and the one-shot method. Specifically, we set
, respectively, and
. In the comparisons, we compared the three-dimensional geolocation errors and the distance error, as shown in
Figure 9. It should be noted that the distance error is the distance between the estimated three-dimensional position and the true three-dimensional position.
The proposed framework utilizes 4 or 8 images to obtain the geolocation results for each geolocation. It can be seen that with an increase in attitude-angle measurement errors, the geolocation errors obtained by the two methods increase. In terms of the x coordinate and y coordinate, the geolocation errors of our method are always smaller than those of the one-shot method. This is because the proposed framework can expand the measurement data by corresponding point matching, and more measurement data can mitigate the negative influence of Gaussian measurement errors. In the same way, for the proposed framework, the geolocation results obtained by using eight images are more stable than those obtained by using four images. However, in terms of z coordinate, the geolocation error of our method is larger than that of the one-shot method. There are two reasons for this phenomenon. On the one hand, the target’s altitude
is calculated according to the distance between the UAV and the moving target, and the true distance is assumed to be known for the one-shot method. On the other hand, the altitude estimation method described in
Section 2.2 regards the point closest to the two lines of sight as the target point. The geolocation error caused by attitude measurement errors is more concentrated on the target altitude because the horizontal distance between the UAV and the target is much smaller than the vertical distance. Finally, the distance error of our method with eight images is smaller than that of the one-shot method, because the distance error contains the three-dimensional errors.
Then, we assumed that only the position has Gaussian measurement error in the UAV’s navigation state, and then compared the performance of the proposed framework and the one-shot method. Specifically, we set
, respectively, and
. In the comparisons, we compare the three-dimensional geolocation errors and the distance error, as shown in
Figure 10. The same conclusion can be obtained; that is, our method can greatly mitigate the impact of Gaussian measurement errors on the geolocation results due to the using of the historical measurement data. The difference is that our method with eight images is better than the one-shot method in terms of z coordinate. There are two reasons for this phenomenon. On the one hand, the influence of UAV’s position measurement error is greater than that of attitude measurement error for the one-shot method. On the other hand, the proposed framework can mitigate the negative influence of Gaussian measurement error by using weighted least-squares. In terms of distance error, our method is superior to the one-shot method, no matter whether four images or eight images are used.
After that, we assume that both the position and the attitude angle have Gaussian measurement error in the UAV’s navigation state, and then compare the performance of the proposed framework and the one-shot method. Specifically, we set
,
, respectively, and
,
. It can be seen in
Figure 11 that our method with eight images has better mitigation effects on the negative influence of various Gaussian measurement errors. In this experiment, the assumption of measurement errors is consistent with most application scenarios. Even if the yaw-angle bias is not considered, the proposed framework is better than the one-shot method in practical applications.
In addition to the Gaussian measurement errors, the yaw-angle biases provided by the low-quality sensors also affect the geolocation results. Therefore, we assume that the navigation state has both Gaussian measurement errors and yaw-angle biases, and compare the performances of the proposed method and the one-shot method at different yaw-angle biases. In detail, we set
and
. The yaw-angle biases
of the left and right UAVs are set as (
34). We set
to 10, 15, 20, 25, and 30, respectively, and then compared the performances of the two methods. It is shown in
Figure 12 that the geolocation accuracy of the one-shot method is greatly reduced when there is bias in yaw-angle measurement, and as the yaw-angle measurement bias increases, the geolocation accuracy of the one-shot method becomes lower and lower. Compared with the one-shot method, the accuracy of our method does not change significantly as the yaw-angle measurement bias increases. This is because our method can estimate the yaw-angle bias through the iterations between the estimation of the target’s altitude and the parameter regression to avoid the negative impact of the yaw-angle measurement bias on geolocation accuracy. However, in terms of z-coordinate error, the accuracy of both methods has no obvious change as yaw-angle bias increases. There are two reasons for this phenomenon. On the one hand, the one-shot method assumes that the distance between the UAV and the moving target is known and the yaw-angle measurement error has little impact on the target’s altitude estimation. On the other hand, the proposed framework can eliminate the negative influence of the yaw-angle biases by utilizing the iterations between the processes of altitude estimation and parameter regression.
In conclusion, we first verified the effectiveness of the proposed corresponding-point-matching method, target altitude estimation method, and yaw-angle bias estimation method in the simulation environment. In addition, we conducted a lot of simulation experiments to verify the effectiveness and robustness of the proposed framework, and the simulation experiments showed that the geolocation results of the proposed framework are more accurate and stable under Gaussian measurement errors and yaw-angle biases compared with the commonly used one-shot method.
3.2. Evaluation in Real Environment
A real indoor experiment has also been performed to further validate the proposed framework. The UAVs used in this experiment were a laboratory product designed by our group, as shown in
Figure 13a. A camera waws installed vertically downward on each UAV, so the attitude of the camera could be known from the attitude of the UAV. The two UAVs tracked the moving target, as shown in
Figure 13b, and transmitted their pose information and target images to the ground station in real time. The two UAVs and the ground station were run in the ROS system, and the precise position and attitude information of the UAVs were provided by VICON (a motion capture system).
The important experimental parameters are shown in
Table 5. The results of the realistic experiment are shown in
Table 6. A total of 147 geolocation estimates were performed while the UAVs were tracking the moving target. Each geolocation needed to use eight images, among which, the distance between two adjacent images should satisfy
m. This shows that our method successfully achieved the geolocation of the moving target, and the absolute mean errors of the three coordinates were 0.035, 0.034, and 0.159 m, respectively. When taking advantage of eight images for geolocation, the FPS was 22, which still meets the real-time requirements.
The geolocation path of the realistic is shown in
Figure 14. The black path represents the actual position of the moving target obtained from VICON. The yellow path represents the position of the moving target obtained from our method. It can be seen that the path obtained by our method is undulating due to the influence of Gaussian measurement error of the UAV’s navigation state. The above-mentioned simulation experiments have demonstrated that our method can mitigate the effects of Gaussian measurement errors by utilizing the historical measurements.
The simulated and realistic experiments show that the proposed framework implements the function of geolocating the moving target using only UAV vision and does not rely on the accurate AHRS. Compared with the commonly used one-shot method, the proposed framework can mitigate the effects of measurement errors in a UAV’s position and attitude by using multiple measurement data and estimating the yaw-angle biases.