1. Introduction
In recent years, unmanned aerial vehicles (UAVs) have become more and more popular because of their fast, flexible, compact, and low-cost characteristics. They has been widely used in environmental monitoring [
1], aerial imaging [
2], disaster assessment [
3], and other fields. Limited by the flying height of the UAVs and the focal length of the digital camera, the imaging area of a single image is limited, so it is necessary to stitch a series of overlapping UAV images into a panorama with a wider field of vision.
In order to obtain natural panorama, most of the current methods are to build an accurate alignment model to reduce parallax error, and according to the method of obtaining the alignment model, UAV image stitching can be divided into image-based stitching and pose information-based image stitching. UAV image stitching based on pose information usually requires extra information, such as camera parameters, global navigation satellite systems (GNSSs), inertial measurement units (IMUs), and ground control points (GCPs); the accuracy of these information directly affects the quality of the stitching results [
1,
2,
3,
4,
5]; image-based stitching does not need this information [
6,
7,
8,
9]. Our proposed method can obtain visually satisfactory stitching results automatically, without the requirements of extra information such as camera calibration parameters and camera poses.
The process of image-based stitching can be roughly summarized as image matching, extracting reliable corresponding points, constructing an alignment model according to the corresponding points, and then blending the warped images, e.g., through multi-band blending, linear blending, etc.
Image matching is the process of identifying the same or similar content and structure from two or more images and corresponding them. It can be roughly divided into two categories, area-based and feature-based matching [
10]. Generally, feature-based methods are used in image stitching, and its basic process can be summarized as feature point extraction, (such as scale invariant features transform (SIFT) [
11], sped-up robust features (SURF) [
12], oriented FAST, rotated BRIEF (ORB) [
13], descriptor construction, and feature-point matching. At this time, there are still a large number of mismatching points in the feature point set, that is, outer points. Therefore, in order to improve the matching accuracy, it is necessary to remove the mismatching. The most commonly used is the random sample consensus (RANSAC) proposed by Fischler and Bolles [
14] and its improved algorithms [
15,
16]. This can find inner points from a set of data sets containing “outer points” and estimate global model parameters. In recent years, some non-rigid image-matching models have been proposed, such as locality-preserving matching (LPM) [
17] and its variety, local graph structure consensus (LGSC) [
18]. They can remove mismatches based on the assumption of locality. In addition, a learning-based technique has also been widely studied to remove outliers. For example, Yi et al. [
19] proposed learning to find good correspondences (LFGC) which could find good feature correspondences by training a network from a set of putative match sets together with their image intrinsics; Ma et al. [
20] proposed a two-class classifier based on learning (LMR) to remove mismatches by using a few training image pairs and handcrafted geometrical representations for training and testing. Image matching can provide more accurate points correspondence for the next stitching step, which is an important step of image stitching.
The purpose of image stitching is to stitch multiple images with overlapping areas into a panorama [
21]. Early image stitching used a global transformation (usually a global homography matrix) to minimize alignment errors, such as AutoStitch [
22] proposed by Brown and Lowe; this method is robust, but not flexible enough. It is only suitable for planar or parallax free scenes. Violating the above assumptions will lead to dislocation, ghost and other problems.
To improve the accuracy of alignment, Lin et al. [
23] proposed a smoothly varying affine (SVA) warp, which replaces the global affine warp by a smoothly affine stitching field; Zaragoza et al. [
24] proposed an as-projective-as-possible (APAP) warp, which estimates the local alignment model based on the global alignment to improve the alignment accuracy. Combing with seam cutting can obtain the optimal local alignment. Zhang and Liu [
25] proposed a parallax-tolerant warp, which finds the optimal homography through the seam and further use content-preserving warping to locally refine the alignment; Lin et al. [
26] proposed a seam-guided local alignment (SEAGULL) warp which uses the estimated seam to guide the optimization of local alignment process and iteratively improves the seam quality, but these methods suffer from projection distortion.
To alleviate perspective distortion in non-overlapping areas, Chang et al. [
27] proposed a shape-preserving half-projective (SPHP) warp, which combines homography warp and similarity warp to maintain good alignment in overlapping areas, and keep the original perspectives in the non-overlapping region; Lin et al. [
28] proposed an as-natural-as-possible (AANAP) warp to solve the problem of unnatural rotation of SPHP which realizes a smoother transition from overlapping areas to non overlapping areas by linearly combining perspective transformation and similarity transformation which has the minimum rotation angle; Chen and Chuang [
29] proposed a mesh-guided warp with the global similarity prior (GSP) which selects better rotation and scale for each image to minimize distortion. Li et al. [
30] propose a novel quasi-homography warp, which effectively balances the perspective distortion against the projective distortion in the non-overlapping region to create a more natural-looking panorama; Li et al. [
31] proposed a parallax-tolerant image-stitching method based on robust elastic warping which constructed an analytical warping function to eliminate the parallax errors and used global similarity transformation to mitigate distortion. They also applied a Bayesian model to remove incorrect local matching in order to ensure more robust alignment.
Human eyes are more sensitive to lines. Emphasizing double features can not only avoid the bending of line segments, but also improve the natural quality of the warped images [
32,
33]. Zhang et al. [
34] proposed a mesh-based framework to stitch wide-baseline images, and designed a line-preserving term to prevent line segments from bending. Xiang et al. [
35] presents a line-guided local warping method with a global similarity constraint which uses line features to strengthen geometric constraints and adopts a global similarity constraint to mitigate projective distortions. Liao and Li [
36] proposed a mesh optimization algorithm based on double features (point features and line features) and a quasi homography model to solve the problems of alignment and distortion and emphasize the naturalness of stitching results; Jia et al. [
37] proposed a dual-feature warp which obtains consistent point and line pairs by exploring co-planar subregions using projective invariants and incorporate global collinear structures as a constraint to preserve both local and global linear structures while alleviating distortions.
Most of the stitched panoramic images have irregular boundaries. For better visual effects, warping-based rectangular boundary optimization is proposed by He et al. [
38], but it cannot deal with a scene that is not completely captured; Zhang et al. [
39] proposed a mesh-based warp with regular boundary constraints, which incorporates line preservation and regular boundary constraints into the image-stitching framework, and conducts iterative optimizations to obtain an optimal piecewise rectangular boundary; Nie et al. [
40] proposed the first deep learning solution to image rectangling which encourages the boundary rectangular mesh shape-preserving and perceptually natural content during iteration.
The traditional image-stitching method (such as AutoStitch) adopts global homography to align images and the stitching results usually have obvious ghosts, dislocations, and perspective distortions due to the insufficient flexibility of the homography. Then the mesh-based alignment model (such as APAP, SPHP, and AANAP) is introduced into the stitching framework to improve alignment accuracy and alleviate perspective distortions by assigning different homography to each mesh, but it may lead to structural distortions, such as line segments bending and irregular boundaries. Therefore, we propose a novel stitching strategy that can further reduce ghosts and dislocations while preventing structural distortions and obtain more natural stitching results with a regular boundary.
In this paper, an effective and robust mesh-based image stitching method is proposed to obtain more natural and accurate stitching results. The main idea of our proposed method can be summarized as improving alignment accuracy, preserving salient structure, and making the stitching results more natural. For alignment, parallax error is reduced from global and local aspects. First, global bundle adjustment is adopted to obtain a more accurate homography matrix and then the mesh-based local bundle adjustment feature alignment model is constructed to reduce parallax error further which is realized by minimizing energy function. For the preservation of salient structure, we try to merge local line segments into global line segments that run through the images and design energy functions guided by global collinear structure to preserve linear structure and align matching line segments. Finally, we emphasize the naturalness of the stitching results through boundary constraint and shape-preserving transform. The experimental results are fully compared and analyzed, including two new quantitative linear structure evaluation metrics.
The main contributions of this paper are summarized as:
A novel image-stitching method is designed using a comprehensive strategy involving global bundle adjustment and a local mesh-based alignment model which can reduce the global transfer error through global bundle adjustment, and reduce the local parallax error by constructing mesh-based local feature alignment energy functions.
New energy functions guided by a global collinear structure are designed to prevent global linear structure distortions and improve the performance of line segments alignment, addressing the decline of stitching quality caused by salient structural distortions. Furthermore, regular boundary constraint combined with mesh-based shape-preserving transform is introduced to obtain more natural stitching results.
Two new quantitative evaluation metrics of linear structure are developed to quantify the preservation and alignment performance of linear structure for image stitching. Comprehensive experimental results and comparisons show that our proposed method is superior to some existing image-stitching methods.
The rest of this paper is introduced as follows.
Section 2 introduces the method we propose in this paper;
Section 3 evaluates the stitching results in terms of alignment accuracy, time efficiency and structure preservation qualitatively and quantitatively;
Section 4 summarizes the work done and analyzes the current challenges of drone image stitching.
3. Experiment and Result
This section lists some experimental results and indicators to evaluate our proposed method. The images used in the experiment are taken by UAV and contain different scenes. In the experiment, VLfeat [
50] is used to extract and match SIFT [
11] feature points, then LGSC [
18] is adopted to remove outliers. We use the source codes [
22,
24,
27,
36] provided by the authors to obtain the stitching results, and compare them with our proposed method.
For the parameters setting, the grid size is set to 40 × 40, and the number of grids varies according to the size of image. We set and for local mesh alignment. are set to for energy minimization. There is no serious conflict between energy terms, so it can be solved stably. All codes run in MATLAB2019a (some algorithms are in C++) on a laptop with Inter i7 2.8 GHz CPU and 8 GB RAM. In the next sections, we will compare and analyze the stitching results of our proposed method with other methods based on alignment accuracy, structure preservation, quantitative evaluation of linear structure and time efficiency respectively.
Due to space constraints, some details in the stitching results cannot be well presented, so we uploaded the stitching results online. The stitching results can be seen and downloaded online at
https://postimg.cc/gallery/0P0L5Zc (accessed on 1 March 2023).
3.1. Comparison of Alignment Accuracy
The alignment accuracy can be evaluated qualitatively and quantitatively. The qualitative evaluation of alignment accuracy can be understood as the degree of blurring and dislocation in overlapping areas.
Figure 11 and
Figure 12 show two groups of comparison results with AutoStitch [
22], APAP [
24], SPHP [
27], SPW [
36] and our proposed method to verify the alignment performance of our proposed method. To better compare the alignment of overlapping areas, we avoid post-processing and use linear blending to blend warping images.
As for quantitative evaluation of point alignment, we use root mean square error (RMSE) and mean square absolute error (MAE) as quantitative evaluation indicators which can be defined as:
where
N is the total number of matching points;
is the align model that projects points onto the reference image plane, if the point is on the reference image,
;
and
is a pair of point correspondence.
The smaller the values, the better the performance, and the units of RMSE and MAE are pixels. We compare our proposed method with other methods including global homography, APAP [
24], SPHP [
27] and SPW [
36] (The AutoStitch [
22] provided by the authors is a software, so we cannot calculate its RMSE and MAE).
Figure 11 shows a set of stitching results, including roads and building complexes. The content in the red box emphasizes the performance of line alignment, while the content in the blue box emphasizes the point alignment. Global transformation such as AutoStitch cannot align overlapping regions well because the global homography is not flexible enough which is only applicable to near planar scenes. The building complex and line segments in overlapping areas have obvious dislocations and ghosts as shown in
Figure 11a. APAP and SPHP can reduce alignment error by constructing local homography warps. However they cannot handle the dislocations of line segments and there are still ghosts and dislocations. We mark it with red and blue boxes in
Figure 11b,c. SPW can solve alignment and distortion problems through mesh-based warp combined with line feature, but there is still misalignment in line segments and buildings as shown in
Figure 11d. Our method can better align line segments because we introduce global collinear structure which can impose more accurate constraints on matching line segments. Our method also has smaller ghosts in a building complex of overlapping areas than other methods in
Figure 11e.
Figure 12 shows another set of stitching results run through by a highway which can more clearly evaluate the performance of line alignment. AutoStitch, APAP, and SPHP cannot align line segments because they construct transformations (global or local) only based on point correspondence as shown in
Figure 12a–c. SPW introduces line features and constructs energy functions based on dual features to prevent distortion of line segments and align matching line segments, but there is still misalignment between line segments in the red box in
Figure 12d. The stitching results of our proposed method show almost no misalignment between matching line segments, and the ghosts between the building clusters are also smaller than other methods, as shown in
Figure 12e.
Table 1 shows the RMSE and MAE values of eight different datasets. Global homography has a higher alignment error than other methods. Bundle adjustment can reduce transfer errors, but alignment issues still exist due to insufficient flexibility of the global homography. APAP can effectively improve alignment accuracy by constructing a local alignment model. SPHP constructs a shape-preserving half-projective warp to mitigate the perspective distortion in non overlapping areas, but cannot align images better. SPW and our method can reduce alignment error further by constructing energy functions based on mesh optimization and our method has lower RMSE and MAE because we apply a higher weight to the point alignment which is consistent with the effects shown in figures.
3.2. Comparison of Structure Preservation
Figure 13 and
Figure 14 show two groups of stitching results of different methods and we evaluate the quality of stitching results in terms of structure preservation.
Figure 13 shows the playground dataset. The global homography can protect the linear structure well, but it suffers from alignment issues; there are very serious ghosts and dislocations, we mark them with a red box in
Figure 13a. APAP can alleviate alignment problems; the playground is accurately aligned, which can be seen in
Figure 13b. However, it suffers from distortions in non-overlapping areas especially in marginal areas and the line segments are slightly bent, we mark it with a green line. SPHP can alleviate the distortion of the non-overlapping area by introducing similarity transformation, but the linear structure is bent during warping and the shape of the playground is distorted as shown in
Figure 13c. SPW and our proposed method can effectively solve alignment and distortion problems by a mesh-based warp. The stitching result of ours is more natural-looking because we impose rectangular constraint on the boundary of the stitching result, as shown in
Figure 13d,e.
Figure 14 shows the stitching result of another dataset which contains many linear structures. The result of AutoStitch has obvious ghosts in
Figure 14a. APAP can improve alignment accuracy but there are distortions in marginal areas. SPHP cannot protect the linear structure, the line segments are bent seriously, and we mark it with a blue box in
Figure 14c. SPW can preserve linear structure well using the mesh-based warp based on double features as shown in
Figure 14d. In addition to preserving linear structures, we also impose constraints on the boundary to obtain stitching results with regular boundary. The content of the boundary is preserved well during the optimization process which can be seen in
Figure 14e.
3.3. Quantitative Comparison of Linear Structure
In order to quantify the performance of line alignment and line preservation of our proposed method, we design a new evaluation method. The evaluation method is based on the distance from point to line, and includes two parts: the line alignment indicator and line preservation indicator .
Given a set of line segments
detected by LSD [
41], where
K is the number of line segments. We first remove the shorter line segments, then uniformly sample them
. The sample points may not be in a straight line after warping (see
Figure 15a). We use the distance from the sample points to the straight line as the indicator of point deviation from the straight line. Thus the error term
can be defined as:
where
is the
i-th line segment projected on the reference image plane which is obtained by projecting the start point and end point of line segment onto the reference image plane, dis is the distance from point
to the line
, which is defined as:
where
is the coordinates of point
p,
are the parameters of the line segment
l.
Given a set of line segment correspondences
, where
M is the number of matching line segments, then uniformly sample these line segments
,
. For the sample points on
and
, we calculate the distance from the sample points
,
to the line
and
, respectively, then average them (see
Figure 15b). Thus the error term
can be defined as:
These two indicators reflect the distance from the sampling points to the straight line, thus their units are pixels.
Table 2 shows a set of quantitative assessments of
and
compared with SPW which only constrains local line segments. For fair comparison, the line segments data used for testing is the same with SPW.
Table 2 shows that our proposed method can align line segments better than SPW because we impose stronger constraints on the matching line segments which is consistent with the effects shown in figures and the ability to maintain line segments is comparable to SPW as shown in
Table 2. The
is larger in some datasets than SPW, probably because the line segments are slightly bent when constraining the boundary, but this tiny error is not noticeable on the stitching results.
3.4. Comparison of Time Efficiency
In this part, we quantitatively compare the time spent of our proposed method with APAP, SPHP, and SPW on different datasets. All the methods are running in MATLAB2019a with the same environment.
Table 3 shows the elapsed time of different methods for stitching images. All the methods are running in MATLAB. The elapsed time of our method includes bundle adjustment, local mesh alignment, line-segments detection, energy function construction, iterative solution, texture mapping, and linear blend. The time for feature detection and matching is not included. The elapsed time calculated from other methods is the same.
As shown in the
Table 3, SPW and our method are comparative because they are both based on mesh optimization and do not have additional parameter calculations. Our method takes less time because we do not add the line feature into bundle adjustment and our energy terms spend less time than SPW. APAP and SPHP spend more time because they both need to calculate many local homography warps and APAP takes a lot of time to perform bundle adjustment for each mesh.
3.5. Failure Cases and Discussion
The experimental results show that our method can achieve accurate alignment, structure preservation, and obtain more natural panorama, but there are still some limitations. Our proposed method may fail if the dominant plane cannot well represent the perspective transformation between images. This usually happens when the captured pictures contain more than one main plane.
Figure 16 shows a set of failure cases. In this case, the input images contain two planes, the ground plane and the tall buildings plane. We are unable to precisely align the tall building because the dominant plane cannot well represent the perspective transformation between tall buildings in different input images. In addition the line preservation term may not work properly in this case because the calculated global homography cannot provide proper perspective transformation. In future work, we may focus on stitching images with a large parallax, and on increasing the speed of stitching.