Feature Point Tracking Method for Visual SLAM Based on Multi-Condition Constraints in Light Changing Environment

Wu, Zibin; Li, Deping; Li, Chuangding; Chen, Yanyu; Li, Shaobin

doi:10.3390/app13127027

Open AccessArticle

Feature Point Tracking Method for Visual SLAM Based on Multi-Condition Constraints in Light Changing Environment

by

Zibin Wu

¹

,

Deping Li

²

,

Chuangding Li

¹,

Yanyu Chen

³ and

Shaobin Li

^4,*

¹

College of Information Science and Technology, Jinan University, Guangzhou 510632, China

²

School of intelligent Systems Science and Engineering, Jinan University, Zhuhai 519070, China

³

Big Data Centers, Gree Electric Appliances, Inc. of Zhuhai, Zhuhai 519000, China

⁴

Data Science Research Institute, City University of Macau, Macau 999078, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(12), 7027; https://doi.org/10.3390/app13127027

Submission received: 10 April 2023 / Revised: 1 June 2023 / Accepted: 6 June 2023 / Published: 11 June 2023

Download

Browse Figures

Versions Notes

Abstract

:

In scenes where there are lighting changes, localization may fail for visual SLAM due to feature point tracking failure. Thus, a feature point tracking method based on multi-condition constraints is proposed for visual SLAM. The proposed method tracks the feature points of optical flow from aspects such as the overall motion position of feature points, descriptor grayscale information, and spatial geometric constraints. First, to solve the problem of feature point mismatch in complex environments, we propose a feature point mismatch removal method that combines optical flow, descriptor, and RANSAC. We eliminate incorrect feature point matches layer by layer through these constraints. The uniformity of feature point distribution in the image can then affect the accuracy of camera pose estimation, and different scenes can also affect the difficulty of feature point extraction. In order to balance the quality and uniformity of the extracted feature points, we propose an adaptive mask homogenization method that adaptively adjusts the mask radius according to the quality of feature points. Experiments conducted on the EuRoC dataset show that the proposed method which integrates the improved feature point mismatch removal method and mask homogenization method into feature point tracking, exhibits robustness and accuracy under various interferences such as lighting changes, image blurring, and unclear textures. Compared to the RANSAC method, we reduce the location error by about 85% using the EuRoC dataset.

Keywords:

feature point tracking; mismatching point pair; visual SLAM; optical flow

1. Introduction

Simultaneous Localization and Mapping (SLAM) is an essential task in robotics and computer vision that aims to simultaneously estimate a robot’s pose and construct a map of the environment [1]. As images and videos can provide detailed information about the environment, a significant portion of SLAM research focuses on visual SLAM (VSLAM) [2]. Feature point tracking is a critical element in VSLAM since it has a significant impact on the accuracy of localization and mapping in later stages. The classic algorithms for feature point tracking include the Lucas–Kanade algorithm [3,4], BRIEF algorithm [5], SIFT algorithm [6], SURF algorithm [7] and ORB algorithm [8,9]. The classic algorithms for feature point tracking can be divided into the feature point descriptor method [10,11,12,13] and optical flow method [14,15,16,17]. The feature point descriptor method involves extracting feature points from two image frames, computing their descriptors, and matching the feature points based on the similarity of their descriptors to achieve feature point matching, such as SIFT [6], SURF [7], ORB [8], and BRIEF [5]. Furthermore, the optical flow method determines the direction and speed of each pixel’s movement on the image plane by calculating the change in pixel gray value between adjacent frames, such as dense optical flow [18,19,20] and sparse optical flow [21,22,23].

In the process of feature point tracking, to improve visual SLAM positioning accuracy, it is necessary to eliminate mismatched point pairs caused by environmental light changes, motion blur, noise, and other factors. For the feature point descriptor, brute force matching is a common method used [24]. In this method, each feature point in one image is compared with every feature point in the other image to find the best match. This is achieved by calculating a distance measure between the feature descriptors. The feature point with the smallest distance is considered to be the best match. Although brute force matching is very simple, the initial brute force matching contains many false matching results. This is because brute force matching only considers that the two descriptors are the most similar to other descriptors, and does not consider that the feature points in image A are not the same point in image B. Thus, brute force matching will match each feature point in image A with a feature point in image B, regardless of whether the matching point exists. Therefore, some researchers use the global distance threshold to reduce the result of mismatching. Only when the distance between two feature point descriptors is less than the set threshold are the two feature points considered matching point pairs. Nonetheless, there are notable variations in the distinguishability of feature point descriptors across different scenarios, necessitating ongoing debugging and threshold determination. Furthermore, dealing with feature points that have inconsistent distinguishability is also a challenging task [25]. Cross-validation is a common method in brute force matching [26]. For the feature point

p_{i}

in image A, the most similar feature point in image B is

q_{i}

, and for

q_{i}

, the most similar feature point in image A is also

p_{i}

, that is, the two feature points are the most similar to each other, so it is considered that

(p_{i}, q_{i})

is a pair of matching points. Although cross-validation can reduce the mismatch of brute force matching to a certain extent, the effect is limited. RANSAC [27] can effectively remove mismatching points; however, RANSAC also removes some correct matching points, reducing the number of matching point pairs [28], which has a negative impact on the subsequent position and attitude estimation due to these erroneous judgements.

For the optical flow method, reverse optical flow is a common method [29,30]. The idea of reverse optical flow is very simple. Similar to cross-validation, the error of matching point bidirectional optical flow is used to remove the mismatching point pair. Reverse optical flow can effectively reduce the inaccurate tracking of optical flow tracking and obtain correct tracking matching results. However, its shortcomings are also evident. First, the optical flow method is only suitable for slow-motion scenes. Secondly, the reverse optical flow is used to eliminate mismatches, which is equivalent to performing optical flow twice in the entire tracking and matching process, resulting in increased time consumption. RANSAC is another common method. However, some correct matching points will also be removed with RANSAC, which is unacceptable when the image quality is not high and the number of feature points tracked is small. Moreover, in similar areas where the image is blurred, RANSAC will fit some mismatched points.

Overall, the descriptor can describe the feature points well, and can effectively eliminate the mismatched points in the case of blurred images, but it cannot eliminate the mismatched points with high regional similarity. The optical flow method reduces the computational complexity required for feature point matching and improves feature point tracking accuracy. However, the optical flow method requires that the camera move slowly and is susceptible to changes in light. In an environment with changing lighting conditions, where the image quality is drastically reduced, a single mismatch point pair elimination method is not effective in eliminating erroneous matches. Therefore, we propose a multi-constraint feature point mismatch removal method. By utilizing the complementary features among various constraints, we can more stably eliminate incorrect matching pairs. Firstly, feature point pairs are extracted using the optical flow method. Then, the BRIEF descriptor of the optical flow point pairs is calculated, and erroneous matches are eliminated through a global threshold method. Subsequently, RANSAC is used to further remove mismatched pairs. Furthermore, we use optical flow errors and feature descriptor errors to avoid correct matching point pairs eliminated by RANSAC. Finally, to retain high-quality matching points, we propose an adaptive image mask homogenization method that adaptively selects and retains the feature points extracted by optical flow tracking. The main contributions of this paper are summarized as follows:

In order to solve the problem of feature point mismatch, this paper proposes a feature point mismatch removal method that combines optical flow, descriptor, and RANSAC, suitable for complex environments.
We propose an improved mask homogenization method to optimize the quality of feature points and improve their distribution in the image.
We integrate the proposed feature point mismatch removal method and improve mask homogenization method in feature tracking. Experimental results on EuRoC [31] dataset demonstrate the robustness and accuracy of our method in various complex environments.

2. Materials and Methods

2.1. Overall Flow

The flow of the proposed multi-constraint feature point tracking is shown in Figure 1. Firstly, we use the optical flow method to match feature point pairs. Then, the BRIEF descriptors of the match feature point pairs are calculated, and a global threshold is used to eliminate matching point pairs with low similarity. RANSAC is then used to eliminate the mismatches feature point pairs that cannot be eliminated by BRIEF descriptor matching. To address potential misjudgments, we evaluate the quality of matching points using optical flow error and descriptor matching error. Matching points of high quality will not be eliminated. Finally, an adaptive mask homogenization method is used to homogenize the feature points, and again extract a sufficient number of feature points.

2.2. Optical Flow Tracking

Optical flow tracking is the first step in the multi-constraint feature point tracking. We use Lucas–Kanade optical flow method for the base of feature point tracking. For a pixel located at

(x, y)

at time t, it is moved to

(x + d x, y + d y)

at time

t + d t

. If the brightness of the pixels in the image is the same, the intensity I of the pixel located at

(x, y)

is preserved as follows:

I (x + d x, y + d y, t + d t) = I (x, y, t)

(1)

If the time difference between adjacent image frames is short, Taylor series expansion to the left of Equation (2) can be used to obtain:

I (x + d x, y + d y, t + d t) \approx I (x, y, t) + \frac{\partial I}{\partial x} d x + \frac{\partial I}{\partial y} d y + \frac{\partial I}{\partial t} d t

(2)

By combining Equations (1) and (2) and divided by

d t

, we obtain the following equations:

\frac{\partial I}{\partial x} \frac{d x}{d t} + \frac{\partial I}{\partial y} \frac{d y}{d t} = - \frac{\partial I}{\partial t} d t

(3)

where

\frac{d x}{d t}

and

\frac{d y}{d t}

represent the velocity on the

x, y

axis and can be denoted as u and v. By denoting

\frac{\partial I}{\partial x}

to

I_{x}

,

\frac{\partial I}{\partial y}

to

I_{y}

and the change of grayscale value with time as

I_{t}

, Equation (3) can be rewritten more compactly in a matrix form as:

[\begin{matrix} I_{x} & I_{y} \end{matrix}] [\begin{matrix} u \\ v \end{matrix}] = - I_{t}

(4)

Assuming that feature points within a certain size have the same motion, we can select several feature points to form linear equations. Then, we can calculate the u and v of the feature points in that region, and obtain the positions of the feature points in the adjacent frames.

2.3. BRIEF Descriptor

When the image quality is not high in a light-changing environment, some mismatch point pairs remain after optical flow. To eliminate the mismatch point pairs, the BRIEF descriptors of the remaining feature points are calculated, and the matching point pairs with low similarity are eliminated using global threshold. BRIEF descriptor is a binary local feature descriptor used for image matching and target recognition. The BRIEF descriptor operates by randomly sampling local regions of an image and comparing the grayscale values of the sampled points. Based on the comparison results, a binary string is generated to serve as a descriptor for the region. The descriptor is centered around a feature point and encompasses a neighborhood window of size

S \times S

. Within this window, N pairs of random points are selected (usually

N = 256

), and the grayscale values of each pair are compared. The resulting comparison values are converted into binary codes, which are concatenated to form the BRIEF descriptor for the feature point.

2.4. RANSAC

For some mismatching point pairs that the BRIEF descriptor cannot exclude, we use the RANSAC method to exclude. Step 1: Randomly select four groups of points from matching points, calculate the parameter of the homography matrix H as Formula (5) shown:

[\begin{matrix} x^{'} \\ y^{'} \\ 1 \end{matrix}] = H [\begin{matrix} x \\ y \\ 1 \end{matrix}] = [\begin{matrix} h_{11} & h_{12} & h_{13} \\ h_{21} & h_{22} & h_{23} \\ h_{31} & h_{32} & 1 \end{matrix}] [\begin{matrix} x \\ y \\ 1 \end{matrix}]

(5)

where

(x, y)

is the feature point position of the previous frame,

(x^{'}, y^{'})

is the feature point position of the next frame, H is the homography matrix. Step 2: Judge whether the other point pairs belonging to the homography matrix H (the error is less than a given threshold value), record the number of data belonging to the homography matrix H; Step 3: Assess the number of points in the second part, if it is the biggest, then retain the homography matrix H, if not, abandon the homography matrix H; Step 4: Repeat step 1, 2, and 3 until the error probability P is smaller than 0.01, then the homography matrix H is found; Step 5: After model found, estimate the homography matrix H again.

2.5. Preserve Feature Points

For the problem that RANSAC will eliminate some correct matching points, we use optical flow error and descriptor matching error to evaluate the quality of these matching points:

\frac{e_{f l o w}}{N_{f l o w}} + \frac{e_{b r i e f}}{N_{b r i e f}} < τ

(6)

where,

e_{f l o w}

is the optical flow tracking error,

e_{b r i e f}

is the matching error of BRIEF descriptor,

N_{f l o w}

and

N_{b r i e f}

are the corresponding normalization factor,

τ

is the quality threshold, default set to 1. When the matching point pair deleted by RANSAC method satisfies Equation (6), the matching point pair is retained. The entire processing process removes mismatches from the optical flow results in terms of motion position, grayscale gradient (descriptor), and geometric constraints (RANSAC), and preserves as many correct matching points as possible.

2.6. Feature Point Homogenization

After deleting the mismatched point pairs, the number of feature points tracked by the optical flow has decreased. To achieve accurate and stable feature point tracking, it is necessary to supplement the reduced number of feature points. To avoid excessive concentration or sparsity of the supplemented feature points, we use image masking to uniformly distribute the feature points. The initially extracted feature points are shown in Figure 2a. First, the feature points tracked by optical flow in previous frame are sorted in descending order of tracking frequency, and an image mask with the same size as the image is created, which each pixel set to 255. The value of the mask position corresponding to the feature point is then traversed. If the value is 255, this feature point is retained. Furthermore, the area of a circle with the feature point position as its center and r as its radius is set to 0 in image mask. If the value of the image mask position corresponding to each feature point is 0, the feature points are deleted. After all the feature points have been traversed, the result of preserving the uniformly distributed feature points is shown in Figure 2b, and the final image mask is shown in Figure 2c. Finally, the Shi-Tomasi feature points are extracted in the image corresponding to the non-zero region of the image mask, as shown by the blue points in Figure 2d.

The distance parameter r of the feature point is a crucial factor in achieving image mask homogenization. If r is set too small, the uniformity of the image will be poor, leading to the extraction of too many feature points. Conversely, if r is set too large, the quality of feature points extracted in low-texture environments may be compromised, although it can make the distribution of feature points more uniform. In order to automatically adapt to changes in the environment, we dynamically adjust the feature point distance r based on the quality of the optical flow feature points. We classify the tracked feature points into high-quality feature points and low-quality feature points, where the quality of the feature point is evaluated based on the variance

σ

of N pixels surrounding the feature point, as shown in following:

σ^{2} = \frac{1}{N} \sum_{k = 0}^{N - 1} {(I_{k} - I_{m e a n})}^{2}

(7)

where

I_{m e a n}

is the grayscale average of N pixels,

I_{k}

is the pixel value of the k pixel point. When

σ

exceeds the threshold value, the feature point is considered a high-quality feature point, otherwise it is a low-quality feature point. Then, we dynamically adjust r by counting the proportion of low quality feature points

α

:

r = \{\begin{matrix} m a x (r \times k, \frac{R}{2}), & α > τ_{1} \\ m i n (\frac{r}{k}, 2 R), & α < τ_{2} \\ r, & o t h e r w i s e \end{matrix}

(8)

where k is the adjustment coefficient, which defaults to 0.8; R is the original radius, which defaults to 20;

τ_{1}

is the upper limit threshold of low quality feature points, and the default value is 0.4,

τ_{2}

is the lower threshold value of low quality feature points, and the default value is 0.03. When the proportion of low quality feature points is high, reduce the feature point distance r to obtain more close high-quality feature points; When the proportion of high-quality feature points is very high, increase the feature point distance r to make the distribution of feature points more uniform. When the scale is appropriate, the current feature point distance r is maintained unchanged.

3. Results

This section presents a series of comparison tests conducted on the current popular visual-inertial datasets collected on-board a micro aerial vehicle (EuRoC) to demonstrate the superiority and effectiveness of the proposed method. The datasets contain synchronized stereo images, IMU measurements and accurate ground truth and is divided into three scenarios: Machine Hall, Vicon Room 1, and Vicon Room 2. Each scenario is further divided into three levels: simple, medium, and difficult. Machine Hall is a factory workshop with five sequence (

M H 01 - M H 05

). There is a dark environment in the difficulty level

M H 04

and

M H 05

images. There are three sequence (

V 101 - V 103

) in the Vicon Room 1. There are image shading changes in images sequence of the medium level

V 102

and the difficulty level

V 103

. The Vicon Room 2 has three sequences,

V 201

,

V 202

, and

V 203

. Among these, the medium level

V 202

and the difficult level

V 203

sequences have issues with image blurring caused by rapid dithering and slight dark environments. The experimental platform consists of (1) AMD Ryzen 5 4600H CPU with a frequency of 3.0 G/Hz, and 4 GB memory; (2) G++ compiler as the software tool under Ubuntu 18.04. We tested and validated the proposed algorithm based on the open-source VINS-Mono framework. We first perform an ablation study to show the effect of each module of our proposed method. Then we conducted comparative experiments with other methods. In Formula (6):

N_{f l o w} = 30

,

N_{b r i e f} = 60

are used for the experiments. In Formula (7):

σ = 20

are used for the experiments.

3.1. Ablations Study

We conducted ablation studies to demonstrate the effectiveness of each module in our proposed method. We evaluate the performance of optical flow tracking, BRIEF descriptor, RANSAC, preserve feature points and feature point homogenization. Because the algorithm is based on sparse optical flow, detecting and tracking too many feature points will consume more time. So the number of tracked feature points is limited to 260. In this experiment, the root mean square error (RMSE) of the absolute trajectory error (APE) of positioning is used to evaluate the effect of mismatch removal. The EVO evaluation tool is used to evaluate the positioning results. The quantitative comparison results are shown in Table 1.

Table 1 shows that when only the optical flow module is used, the average and standard deviation of position error are the worst, with values of 0.2 and 0.5, respectively. Although the positioning error of combining the module of optical flow tracking, BRIEF descriptor, RANSAC, preserve feature points and feature point homogenization is not the lowest in easy scenarios, its performance is the best in hard scenarios. As the module increases, the average and standard deviation of position error decreases significantly. When combining the module of optical flow tracking, BRIEF descriptor, RANSAC, preserve feature points and feature point homogenization, the average error and standard deviation of position error are 0.014 and 0.017, respectively.

3.2. Compared with Other Methods

We compare the performance of our proposed methods with RANSAC and reverse optical flow method from aspects positioning error, running times, and the number of matching feature points.

3.2.1. Positioning Error

The positioning error results of various feature point mismatch removal methods are shown in Table 2. From the experimental results, it can be seen that in easy scenarios such as

M H 01 - M H 03

,

V 101 - V 102

, and

V 201 - V 202

, the RANSAC method can effectively eliminate mismatched points and achieve relatively high positioning accuracy. However, in scenes such as

M H 04 - M H 05

in a dark environment,

V 103

with varying illumination, and

V 203

with weak texture and poor image quality, RANSAC performs poorly, with an accuracy of only 0.125 m for

M H 04 - M H 05

, and only 0.161 m for

V 203

. Our proposed method further improves the accuracy under the influence of illumination changes and image blurring by combining the RANSAC method. The reverse optical flow exhibits an upper middle level in various scenes. Compared to the RANSAC method used by VINS Mono, our proposed method has improved the positioning effect in various scenarios, especially in the light changes of

V 103

and fuzzy weak texture scenarios of

V 203

. It is indicating that the method in this paper has better accuracy and robustness for feature point mismatch removal.

3.2.2. Running Times

In addition to accurate feature point matching and positioning accuracy, running speed is also important for visual SLAM. This experiment evaluated the average processing time required for each frame by various methods in 11 sequence datasets of EuRoC. The experimental results are shown in Table 3. As can be seen, the processing time of the RANSAC method is relatively short, requiring only about 0.52 ms in an easy Machine Hall scenario. However, as the difficulty of the scene increases, such as factors including lighting changes and image blurring, the time required for RANSAC also increases. In

V 203

, the average processing time per frame requires 4.3 ms, which is 8 times that of a normal scene. This is because the quality of feature points in these scenarios is poor, and RANSAC requires multiple iterations to achieve better results. The processing time of reverse optical flow is relatively long, requiring about 6.5 ms. The processing time of our proposed method is in the middle, mainly composed of the total time required for RANSAC and reverse optical flow method.

3.2.3. The Number of Matching Feature Points

In order to verify that the algorithm in this article can retain more accurate matching points, this experiment tested the number of feature point matchings for various algorithms under easy scenarios (

V 201

) and hard scenarios (

V 203

), as shown in Table 4. In easy scenarios with less interference, each algorithm can match more feature points, and tracking 260 feature points can match 220 to 240. However, in hard scenes with blurring caused by dithering and large weak texture regions (

V 203

), the number of feature points matched by various algorithms decreases sharply, and RANSAC only matches 134 feature points, our proposed method can still match 164 feature points. Figure 3 shows the feature points that RANSAC and our proposed method. As described in Section 2.4, RANSAC removes some correctly matched points, which has little impact in scenarios with more tracking feature points such as

V 201

. However, this can have a significant negative impact when the image is blurred, the quality is not high, and the number of feature points tracked is small. As shows in Figure 3a, the number of feature points matched to some images is even less than 20, which can lead to reduced constraints between images, thereby affecting the accurate estimation of posture. Figure 3b shows that our proposed method matches more feature points than RANSAC, ensuring sufficient feature points to provide pose constraints even when the image quality is poor.

4. Conclusions

In this paper, we proposed a multi-constraint feature point tracking method to tackle issues regarding the inability of the single mismatch removal method to adapt to various scenarios. This method eliminates erroneous feature matching pairs extracted by optical flow using multiple constraints such as motion position, gray gradient (descriptor), and geometric constraints (RANSAC), while retaining as many feature points as possible. For the homogenization of feature points, we proposed an image mask homogenization method by dynamically adjusting the radius of the mask. The improved method can effectively balance the quality and uniform distribution of feature points, especially when processing images of poor quality with blurring and weak texture. Finally, the effectiveness of various algorithms and proposed algorithms is tested and compared on the EuRoC dataset, and the results show that our proposed method is robust and effective in easy scenes and difficult scenarios such as those containing light changes, image blurring, and weak textures.

In future work, we hope to integrate deep learning to more effectively extract high-quality feature points. Alternatively, we can directly utilize the features output by deep learning, rather than being limited to feature points.

Author Contributions

Conceptualization, D.L. and C.L.; methodology, Z.W.; software, Y.C.; validation, S.L., Y.C. and D.L.; formal analysis, C.L.; investigation, Z.W.; resources, Z.W.; data curation, C.L.; writing—original draft preparation, C.L.; writing—review and editing, D.L.; visualization, C.L.; supervision, Z.W.; project administration, Z.W.; funding acquisition, S.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Gree Electric Appliances, Inc. of Zhuhai and Zhuhai Industry-University-Research Cooperation Project under Grant 384 ZH22017001210107PWC.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Cadena, C.; Carlone, L.; Carrillo, H.; Latif, Y.; Scaramuzza, D.; Neira, J.; Reid, I.D.; Leonard, J.J. Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age. IEEE Trans. Robot. 2016, 32, 1309–1332. [Google Scholar] [CrossRef] [Green Version]
Fuentes-Pacheco, J.; Ascencio, J.R.; Rendon-Mancha, J.M. Visual simultaneous localization and mapping: A survey. Artif. Intell. Rev. 2012, 43, 55–81. [Google Scholar] [CrossRef]
Lucas, B.D.; Kanade, T. An Iterative Image Registration Technique with an Application to Stereo Vision. In Proceedings of the 7th International Joint Conference on ArtificialIntelligence, Vancouver, BC, Canada, 24–28 August 1997. [Google Scholar]
Zhang, X.; Li, P.; Li, Y. Feature Point Extraction and Motion Tracking of Cardiac Color Ultrasound under Improved Lucas–Kanade Algorithm. J. Healthc. Eng. 2021, 2021, 4959727. [Google Scholar] [CrossRef] [PubMed]
Calonder, M.; Lepetit, V.; Strecha, C.; Fua, P. BRIEF: Binary Robust Independent Elementary Features; Springer: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
Bay, H.; Tuytelaars, T.; Van Gool, L. Surf: Speeded up robust features. Lect. Notes Comput. Sci. 2006, 3951, 404–417. [Google Scholar]
Rublee, E.; Rabaud, V.; Konolige, K.; Orb, G. An efficient alternative to SIFT or SURF. In Proceedings of the International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2564–2571. [Google Scholar]
Shao, S. A Monocular SLAM System Based on the ORB Features. In Proceedings of the 2023 IEEE 3rd International Conference on Power, Electronics and Computer Applications (ICPECA), Shenyang, China, 29–31 January 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1221–1231. [Google Scholar]
Li, S.; Wang, Q.; Li, J. Improved ORB matching algorithm based on adaptive threshold. J. Physics Conf. Ser. 2021, 1871, 012151. [Google Scholar] [CrossRef]
Leutenegger, S.; Chli, M.; Siegwart, R.Y. BRISK: Binary Robust invariant scalable keypoints. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2548–2555. [Google Scholar] [CrossRef] [Green Version]
Aguilera, C.A.; Sappa, A.D.; Toledo, R. LGHD: A feature descriptor for matching across non-linear intensity variations. In Proceedings of the 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, QC, Canada, 27–30 September 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 178–181. [Google Scholar]
Li, H.; Yang, H.; Chen, K. Feature point extraction and tracking based on a local adaptive threshold. IEEE Access 2020, 8, 44325–44334. [Google Scholar] [CrossRef]
Senst, T.; Eiselein, V.; Sikora, T. Robust local optical flow for feature tracking. IEEE Trans. Circuits Syst. Video Technol. 2012, 22, 1377–1387. [Google Scholar] [CrossRef]
Nourani-Vatani, N.; Borges, P.V.; Roberts, J.M. A study of feature extraction algorithms for optical flow tracking. In Proceedings of the Australasian Conference on Robotics and Automation, Wellington, New Zealand, 3–5 December 2012. [Google Scholar]
Farnebäck, G. Two-frame motion estimation based on polynomial expansion. In Proceedings of the Image Analysis: 13th Scandinavian Conference, SCIA 2003, Halmstad, Sweden, 29 June–2 July 2003; Springer: Berlin/Heidelberg, Germany, 2003; pp. 363–370. [Google Scholar]
Zhai, M.; Xiang, X.; Lv, N.; Kong, X. Optical flow and scene flow estimation: A survey. Pattern Recognit. 2021, 114, 107861. [Google Scholar] [CrossRef]
Le Besnerais, G.; Champagnat, F. Dense optical flow by iterative local window registration. In Proceedings of the IEEE International Conference on Image Processing 2005, Genova, Italy, 14 September 2005; IEEE: Piscataway, NJ, USA, 2005; Volume 1, pp. 1–137. [Google Scholar]
Walker, J.; Gupta, A.; Hebert, M. Dense optical flow prediction from a static image. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 13–16 December 2015; pp. 2443–2451. [Google Scholar]
Gehrig, M.; Millhäusler, M.; Gehrig, D.; Scaramuzza, D. E-raft: Dense optical flow from event cameras. In Proceedings of the 2021 International Conference on 3D Vision (3DV), London, UK, 1–3 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 197–206. [Google Scholar]
Timofte, R.; Van Gool, L. Sparse flow: Sparse matching for small to large displacement optical flow. In Proceedings of the 2015 IEEE Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 5–9 January 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 1100–1106. [Google Scholar]
Senst, T.; Eiselein, V.; Sikora, T. II-LK–a real-time implementation for sparse optical flow. In Proceedings of the Image Analysis and Recognition: 7th International Conference, ICIAR 2010, Póvoa de Varzim, Portugal, 21–23 June 2010; Springer: Berlin/Heidelberg, Germany, 2010; pp. 240–249. [Google Scholar]
Dan, L.; Dai-Hong, J.; Rong, B.; Jin-Ping, S.; Wen-Jing, Z.; Chao, W. Moving object tracking method based on improved lucas-kanade sparse optical flow algorithm. In Proceedings of the 2017 International Smart Cities Conference (ISC2), Wuxi, China, 14–17 September 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–5. [Google Scholar]
Jakubović, A.; Velagić, J. Image feature matching and object detection using brute-force matchers. In Proceedings of the 2018 International Symposium ELMAR, Zadar, Croatia, 16–19 September 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 83–86. [Google Scholar]
Bostanci, E.; Kanwal, N.; Bostanci, B.; Guzel, M.S. A fuzzy brute force matching method for binary image features. arXiv 2017, arXiv:1704.06018. [Google Scholar]
Hongzhen, L.; Liang, W.; Hongwei, S.; Yunqing, L. An improved ORB feature matching algorithm. J. Beijing Univ. Aeronaut. Astronaut. 2021, 47, 2149–2154. [Google Scholar]
Fischler, M.A.; Bolles, R.C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
Songtao, Z.; Chao, L.; Liqing, L. An improved method for eliminating false matches. In Proceedings of the 2017 2nd International Conference on Image, Vision and Computing (ICIVC), Chengdu, China, 2–4 June 2017; pp. 133–137. [Google Scholar] [CrossRef]
Qin, T.; Pan, J.; Cao, S.; Shen, S. A General Optimization-based Framework for Local Odometry Estimation with Multiple Sensors. arXiv 2019, arXiv:1901.03638. [Google Scholar]
Qin, T.; Cao, S.; Pan, J.; Shen, S. A General Optimization-based Framework for Global Pose Estimation with Multiple Sensors. arXiv 2019, arXiv:1901.03642. [Google Scholar]
Burri, M.; Nikolic, J.; Gohl, P.; Schneider, T.; Rehder, J.; Omari, S.; Achtelik, M.W.; Siegwart, R. The EuRoC micro aerial vehicle datasets. Int. J. Robot. Res. 2016, 35, 1157–1163. [Google Scholar] [CrossRef]
Yang, Y.; Wu, C.; Yang, Z. Research on UAV Visual SLAM Based on Fusing Improved RANSAC Optical Flow Method. Semiconductor Optoelectronics 2023, 44, 277–283. [Google Scholar]
Zhu, X.; Yang, J.; Hu, J. Monocular Visual-inertial SLAM Algorithm Based on the Improved ORB Feature Matching. J. Qingdao Univ. (Natural Sci. Ed.) 2023, 36, 60–64+70. [Google Scholar]

Figure 1. The overall flow of the proposed method.

Figure 2. Image mask homogenization. (a) The original feature points with the image mask (red dots), the black balls are the ineffective area of the mask; (b) the results of homogenization; (c) the generated mask; (d) the feature points of secondary extraction in the next frame.

Figure 3. Comparison of the number of feature points for real-time matching between RANSAC and this method at

V 203

.

Figure 3. Comparison of the number of feature points for real-time matching between RANSAC and this method at

V 203

.

Table 1. Location error (RMSE/m) of each step on EuRoC dataset, ave. and std. are the average and the standard deviation of location error. a is only the module of optical flow tracking; b is the module of optical flow tracking and BRIEF descriptor; c is the module of optical flow tracking, BRIEF descriptor and RANSAC; d is the module of optical flow tracking, BRIEF descriptor, RANSAC and preserve feature points; e is the module of optical flow tracking, BRIEF descriptor, RANSAC, preserve feature points and feature point homogenization.

Dataset	MH01	MH02	MH03	MH04	MH05	V101	V102	V103	V201	V202	V203	ave.	std.
Dataset	Easy	Easy	M id	Hard	Hard	Easy	Mid	Hard	Easy	Mid	Hard	ave.	std.
a	0.074	0.068	0.065	0.128	0.125	3.333	0.063	0.145	0.048	0.088	0.137	0.535	0.976
b	0.089	0.064	0.092	0.114	0.130	0.052	0.051	0.094	0.055	0.073	0.095	0.021	0.026
c	0.075	0.072	0.066	0.110	0.105	0.052	0.042	0.116	0.057	0.076	0.126	0.023	0.028
d	0.078	0.065	0.065	0.108	0.109	0.057	0.048	0.101	0.055	0.068	0.103	0.019	0.023
e	0.080	0.053	0.071	0.099	0.102	0.063	0.054	0.077	0.063	0.053	0.087	0.014	0.017

Table 2. Location error (RMSE/m) of various methods on EuRoC dataset, ave. and std. are the average and the standard deviation of location error: a is RANSAC method; b is reverse optical flow method; c from reference [32]; d from reference [33]; e is our proposed method.

Dataset	MH01	MH02	MH03	MH04	MH05	V101	V102	V103	V201	V202	V203	ave.	std.
Dataset	Easy	Easy	M id	Hard	Hard	Easy	Mid	Hard	Easy	Mid	Hard	ave.	std.
a	0.070	0.074	0.067	0.125	0.123	0.052	0.063	0.142	0.058	0.085	0.161	0.093	0.036
b	0.074	0.054	0.082	0.113	0.117	0.061	0.046	0.086	0.056	0.076	0.130	0.081	0.027
c	0.253	-	-	0.380	0.293	-	0.110	0.089	0.166	-	-	0.225	0.123
d	0.137	0.152	0.184	0.276	-	0.112	0.134	0.183	-	-	-	0.168	0.052
e	0.080	0.053	0.071	0.099	0.102	0.063	0.054	0.077	0.063	0.053	0.087	0.014	0.017

Table 3. The running time of different method (ms).

Dataset	MH01	MH02	MH03	MH04	MH05	V101	V102	V103	V201	V202	V203
Dataset	Easy	Easy	M id	Hard	Hard	Easy	Mid	Hard	Easy	Mid	Hard
RANSAC	0.51	0.52	0.51	0.53	0.53	0.52	0.91	2.19	0.52	1.16	4.31
Reverse optical flow	6.53	6.61	6.82	6.48	6.41	6.50	6.37	6.32	6.39	6.34	6.88
Our method	4.21	4.15	4.42	4.22	4.14	4.06	4.09	4.96	4.14	4.53	6.36

Table 4. Average matching feature points of different method (total is 260).

Dataset	RANSAC	Reverse Optical Flow	Our Method
V201	228	230	240
V203	134	130	164

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, Z.; Li, D.; Li, C.; Chen, Y.; Li, S. Feature Point Tracking Method for Visual SLAM Based on Multi-Condition Constraints in Light Changing Environment. Appl. Sci. 2023, 13, 7027. https://doi.org/10.3390/app13127027

AMA Style

Wu Z, Li D, Li C, Chen Y, Li S. Feature Point Tracking Method for Visual SLAM Based on Multi-Condition Constraints in Light Changing Environment. Applied Sciences. 2023; 13(12):7027. https://doi.org/10.3390/app13127027

Chicago/Turabian Style

Wu, Zibin, Deping Li, Chuangding Li, Yanyu Chen, and Shaobin Li. 2023. "Feature Point Tracking Method for Visual SLAM Based on Multi-Condition Constraints in Light Changing Environment" Applied Sciences 13, no. 12: 7027. https://doi.org/10.3390/app13127027

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Feature Point Tracking Method for Visual SLAM Based on Multi-Condition Constraints in Light Changing Environment

Abstract

1. Introduction

2. Materials and Methods

2.1. Overall Flow

2.2. Optical Flow Tracking

2.3. BRIEF Descriptor

2.4. RANSAC

2.5. Preserve Feature Points

2.6. Feature Point Homogenization

3. Results

3.1. Ablations Study

3.2. Compared with Other Methods

3.2.1. Positioning Error

3.2.2. Running Times

3.2.3. The Number of Matching Feature Points

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI