1. Introduction
Timely and effective access to forest growth information is highly significant for forest resource protection and the development of reasonable forest management plans [
1]. Airborne laser lidar overcomes the shortcomings of traditional manual field survey, which is time-consuming and laborious [
2]. Therefore, airborne laser lidar has been widely used in forestry surveys in recent decades [
3,
4]. However, due to the complex stand structures, it is difficult to cover all kinds of situations with traditional ITS algorithms by artificially setting segmentation rules, and the universality and accuracy of these ITS algorithms still have great room for improvement [
5,
6]. Therefore, it is particularly important to design an ITS method with high universality and high accuracy.
From the perspective of segmentation strategy, ITS methods can be divided into two categories: bottom-up and top-down [
7], and the top-down segmentation strategy is adopted in this paper. As the trunk points of ALS forest point cloud with high canopy density are sparse [
8,
9], the bottom-up method is seriously affected by noise when selecting seed points, resulting in poor segmentation results.
In the bottom-up method, the tree trunks in vertical space are extracted, and then the tree trunks are used as seed points to delineate the individual trees. Existing methods include the tree trunk detection fusion tree crown segmentation method [
7], the two-dimensional tree trunk detection method [
10], and the adaptive trunk detection method [
11]. In general, due to the poor penetration rate of airborne radar pulses, the tree trunk points scanned in a high-canopy-density environment are sparse [
8], and there may be a large deviation in the seed points, thus affecting the depiction of the canopy [
7], leading to the poor segmentation of the bottom-up method in the high canopy density ALS forest point cloud.
In the top-down method, canopy features (e.g., gradient characteristics among canopies, treetops, etc.) are identified to delineate the individual trees. Common methods include the spatial horizontal distance rule algorithm (SHDR) [
12], which has high segmentation accuracy but high time complexity, and is very sensitive to the selection of the horizontal distance threshold parameter; therefore, it has poor universality. The mark-controlled watershed segmentation algorithm [
13] uses the mark-controlled watershed method to segment individual trees; the segmentation accuracy is poor when dealing with mixed forests with unobvious canopy characteristics. The DSM segmentation algorithm [
14] uses the slope characteristics of the canopies to segment the conifers and even the mixed forest with weak canopy characteristics. However, it is easy to treat branches as individual trees, resulting in over-segmentation errors. The minimization energy function segmentation algorithm [
15] can be applied to a variety of different forest types, but there are a huge number of parameters that need to be manually adjusted; therefore, the universality is poor. The adaptive mean shift algorithm [
16] estimates the average canopy width of the whole forest land through the slope feature and uses it as the parameter of the mean shift algorithm to delineate the individual trees. This method is suitable for plantations with similar crown widths but performs poorly for mixed forests with large differences in crown width. In general, the universality and accuracy of the traditional top-down segmentation algorithm need to be improved when segmenting different kinds of forest lands [
12,
15].
Benefiting from the strong robustness, accuracy, and universality of deep learning, more and more scholars have begun to adopt deep learning methods to ITS [
17]. However, due to the interweaving and complexity in high-canopy-density forests, the deep learning ITS methods still have under-segmentation and over-segmentation [
18,
19,
20]. In recent years, point cloud deep learning has been widely used in point cloud segmentation tasks due to its strong universality and accuracy. Charles R. Qi et al. [
21] proposed PointNet, which can directly segment point clouds without additional transformation but does not consider the relationship among points, which makes it difficult to analyze the intricate forest structure. PointNet++ [
22] is a deep network based on PointNet. Although it enhances the ability to extract complex features by fusing features extracted from different receptive fields, its ability to perceive complex stand structures still needs to be improved. Charles R. Qi et al. [
23] proposed VoteNet, which can offset each point to the corresponding instance center to increase the discrimination of different objects. However, the center offset loss function does not take into account the different contributions of each point to the loss function, resulting in an unsatisfactory offset result. PointGroup [
24] segments each object through the clustering algorithm after the center offset but ignores the over-segmentation problem caused by the deep network randomness. Therefore, the segmentation accuracy still needs to be improved. Shaoyu Chen et al. [
25] proposed HAIS, which combines PointGroup with a set aggregation algorithm to reduce the over-segmentation problem. However, the HAIS set aggregation algorithm aggregates the point sets which are close to each other, which causes the point set of a tree to be aggregated into another adjacent tree, causing over-segmentation. Ashish Vaswani et al. [
26] proposed Transformer, which has a strong perception ability for complex structures. On this basis, Hengshuang Zhao [
27] proposed the Point Transformer, which performs point cloud segmentation for indoor objects. However, how to segment forest point clouds with complex structures remains to be solved. In general, although deep learning methods based on point clouds have become more and more mature, there are still many problems to be solved when applying them to segment individual trees [
19].
Inspired by VoteNet and HAIS, if the relatively scattered data points are offset to the center of each tree to enhance the discrimination among different trees, then the ITS can be realized by applying the clustering algorithm. However, airborne laser lidar has poor penetration and cannot scan complete individual trees (especially in locations with severe canopy occlusion) [
8]. At this time, the calculated individual tree center point has a huge gap with the actual individual tree center point, which seriously interferes with the learning effect of the deep network, resulting in a poor ability to enhance the discrimination among trees.
Considering the treetop points are distinct and stable, an extreme offset loss function for ITS is designed based on HAIS, which makes the points offset to their respective treetops to enhance the discrimination among different trees. Then, the density of the treetop increases significantly after points are offset to their treetops. On this basis, the mean shift algorithm is adopted to segment the individual trees preliminarily. Due to the deep network randomness, the offset effect of some points is poor. Directly using the mean shift algorithm for clustering may suffer from serious under-segmentation and over-segmentation problems. Considering the treetop density increases significantly after extreme offset, the average proximity distance is adaptively generated as the bandwidth of the mean shift algorithm so that the dense points near the treetop are clustered into one class. At the same time, the isolated points with poor offset are clustered separately to eliminate the under-segmentation error as much as possible. Finally, in order to reduce the over-segmentation error, the ITS set aggregation based on the gradient change criterion is designed to merge the sets with close neighbors and gentle gradient change.
The remainder of the paper is organized as follows.
Section 2 describes the key steps of the data preprocessing and methodology.
Section 3 describes the experimental data and evaluation metrics.
Section 4 carries out a validity experiment, ablation experiment, and contrast experiment and analyzes the results. The discussion of the proposed method is given in
Section 5. Finally, concluding remarks are given in
Section 6.
In general, the key contributions of our work are as follows.
Extreme offset deep learning method. Aiming at the problem of incomplete artificial feature extraction ability in traditional ITS segmentation algorithm, an extreme offset deep learning method is proposed. The forest point cloud features are automatically extracted through the extreme offset deep network, and the points are offset to the vicinity of the corresponding extreme points to enhance the discrimination among neighboring individual trees.
Dynamic bandwidth. Aiming at the problem that the mean shift algorithm cannot adaptively determine bandwidth in a spatial transformed offset point cloud, a dynamic bandwidth calculation strategy based on average nearest neighbor distance is designed. This strategy can automatically determine the bandwidth of the mean shift algorithm without any prior knowledge, which enhances the universality of the mean shift algorithm.
ITS set aggregation. Aiming at the over-segmentation problem caused by the randomness of the deep network, considering the characteristics that the canopy gradient changes sharply among different trees and gently within the same tree, the ITS set aggregation based on gradient change is designed to improve the segmentation accuracy in complex woodlands.
2. Methods
Based on the HAIS framework, the proposed method focuses on optimizing the steps of offset, clustering, and set aggregation for ITS and realizes extreme offset, adaptive clustering, and ITS set aggregation, respectively. The specific process is shown in
Figure 1, which includes preprocessing, extreme offset, mean shift, spatial mapping, set aggregation, and postprocessing. The process and function of specific links are as follows.
- (1)
Preprocessing. Data preprocessing included point cloud filtering, elevation normalization, dividing sub-plot (25 m × 25 m), point cloud denoising, down-sampling, and coordinate normalization.
- (2)
Extreme offset. The extreme offset deep learning method is used to perform a spatial transformation on the preprocessed point cloud, and each point is offset to the corresponding treetop to enhance the discrimination among different trees.
- (3)
Mean shift. In view of the large density of treetops in the offset point cloud, the self-adaptive mean shift algorithm based on average neighboring distance is adopted to cluster the offset point cloud, divide the offset point cloud set, and complete the labeling.
- (4)
Space mapping. The offset point cloud, after clustering and labeling, is mapped back to the original point cloud space to complete the preliminary segmentation.
- (5)
ITS set aggregation. Considering the characteristics that the gradient change among different tree canopies is sharp, while the same tree canopy is gentle, the adjacent canopies with gentle gradient change are aggregated to reduce the over-segmentation error.
- (6)
Postprocessing. The segmentation is completed after up-sampling and coordinate de-normalization. The flowchart of the proposed method is shown in
Figure 1.
2.1. Data Preprocessing
After obtaining the ALS point cloud, the data are subjected to point cloud filtering, elevation normalization, subplot division, point cloud denoising, down-sampling, and coordinate normalization, and the whole process is shown in
Figure 2. The specific steps of data preprocessing are as follows. (1) Point cloud filtering. The cloth simulation algorithm [
28] is adopted to filter the point cloud, and the point cloud is divided into two types: ground points and non-ground points. (2) Elevation normalization. CloudCompare software is used to normalize the elevation of the point cloud to remove the influence of terrain relief. (3) Divide sub-plot. The sample plot is divided into 25 m × 25 m sub-plots. (4) Point cloud denoising. The residual outliers and branches at the sub-plot boundaries are removed by artificial visual denoising. (5) Down-sampling. Grids of size 0.5 m × 0.5 m are divided on the xoy plane, and the highest point in each grid is taken as the sampling point, and these sampling points are smoothed with a mean smoothing filter. (6) Coordinate normalization. After the coordinate normalization operation of the sub-plots, the data preprocessing is completed.
2.2. Point Transformer with Extreme Loss Function
Point Transformer is a transformer-based deep network. The introduction of a self-attention mechanism also gives it a strong feature extraction ability for forest point clouds with complex structures. It consists of three core modules: a down-sampling module, an up-sampling module, and a transformer module. The Point Transformer network structure is shown in
Figure 3.
The down-sampling module reduces the number of data points and extracts deeper feature information at the same time. Firstly, the FPS algorithm is used to sample the input point cloud to obtain the sampled point cloud. Secondly, in order to extract deeper feature information, each sampled point is taken as the center point, and the kNN graph is used to find the k nearest neighbor points around the center point. Finally, each center point and its neighbors form a set, which is input into the deep network for deep feature extraction.
The up-sampling module assigns deep features to non-sampled points. Firstly, the features of the point cloud composed of center points are extracted by MLP, BN, and ReLU. Secondly, the inverse distance weight interpolation algorithm is used to interpolate the features of each center point to its neighboring point. Finally, the features of the corresponding down-sampling block are aggregated to the corresponding up-sampling block by skip connection.
The Transformer module strengthens the local semantic awareness of each point to the points around it. It consists of two MLPs and a Transformer layer. The Transformer layer is shown in Equation (1):
where
is a relation function;
is a normalization function such as softmax;
is an MLP with two linear layers and a ReLU nonlinearity;
,
and
are pointwise feature transformations, such as linear projection or MLP;
is the set of
consisting of the k neighbors of
;
and
is the feature vector;
is the output feature;
is the position encoding function, and its expression is
, where,
and
are the 3D coordinates of points
i and
j, respectively; the encoding function
is an MLP with two linear layers and a ReLU nonlinear layer.
The extreme offset loss function can measure the error between the predicted offset value and the true offset value so as to guide the update of the offset network. By continuously adaptively adjusting the parameters of the offset network to minimize the loss function, the offset network can gradually improve the accuracy of the predicted offset results and improve the model performance.
The proposed extreme offset loss function is shown in Equations (2) and (3):
where
is the loss value of the loss function;
N is the total number of points;
is smaller, so they should contribute less to the loss and therefore have less weight;
is the true offset of the point to the extreme point.
is the offset predicted by the network for each point; X is a vector;
is a function to calculate the dynamic weight according to the length of vector X. Points near the extreme of the tree are less dependent on extreme offset and
denotes the length of the vector
.
2.3. Mean Shift Algorithm with Dynamic Bandwidth
After the extreme offset, the scattered points gather toward the corresponding treetop, the density of the treetop increases significantly, and the density of the non-tree top decreases significantly. According to this density discrimination, the mean shift algorithm with dynamic bandwidth is adopted to cluster the offset point cloud, divide the offset point cloud set, and complete the labeling.
The principle of the mean shift algorithm [
29] is to use the probability density to obtain the local optimal solution, and the schematic diagram of the mean shift process is shown in
Figure 4. Firstly, the mean shift vector of the current center point is calculated. Secondly, the point is moved to the end of the mean shift vector. Then, it is taken as the new starting point and continued to move until the length of the mean shift vector is less than the allowable error. Finally, the iteration is terminated when all points have been marked. The mean shift vector is shown in Equation (4):
where
is the center point;
is the point in the bandwidth;
n is the number of points in the bandwidth;
is the derivative of the RBF kernel function;
h is the bandwidth, which is determined by the following dynamic bandwidth strategy.
The dynamic bandwidth strategy is given in Equation (5). Firstly, the KDTree is constructed by the offset point cloud. KDTree is constructed to arrange the unordered point clouds in a certain order, which is convenient for fast and efficient retrieval. Second, the average distance of each point to its k nearest neighbors is calculated according to KDTree. Finally, these average distances are averaged, and the resulting average nearest neighbor distance is used as the dynamic bandwidth
h.
where,
is the offset point cloud;
n is the total number of points.
k is a number of nearest neighbors of a point, equal to 5% of the number of points in the point cloud.
is used to calculate the average distance of point
to its k nearest neighbors.
is used to find the average nearest neighbor distance, which is used as the dynamic bandwidth
h.
After self-adaptive mean shift clustering on the offset point cloud, all the offset points are labeled. As the order of the points is not changed before and after the extreme offset, the label vector after the cluster in the offset point cloud is concatenated with the original point cloud space so that the offset point cloud after the cluster is mapped back to the original point cloud space, and the preliminary ITS is completed.
2.4. ITS Set Aggregation Based on Gradient Change
Due to the offset randomness caused by the deep network, the mean shift may individually cluster some branches, resulting in under-segmentation errors. In order to reduce the over-segmentation error, the ITS set aggregation algorithm is designed.
When two sets are aggregated, the smaller set should be aggregated into the larger set. In the process of extreme offset, most of the points are successfully offset to the treetop, but a small part of the points are invalid offset. Therefore, the larger scale of the set, the closer the set is to the correct segmentation result. Therefore, the set with a relatively large scale is regarded as part of the correctly segmented individual tree, and the set with a relatively small scale is regarded as a fragment generated by an invalid offset.
The ITS set aggregation algorithm based on gradient change criterion is as follows. If the horizontal distance between the fragment vertex point and the nearest neighbor point of the other set is greater than the preset bandwidth, then the gradient between the two sets is considered to change dramatically; therefore, they belong to different trees, respectively, and should not be aggregated, as shown in
Figure 5a.
If the horizontal distance between the fragment vertex point and the nearest neighbor point of the other set is less than the preset bandwidth, then the gradient change between the two sets is considered to be gentle, and they should belong to the same tree; therefore, the two sets are aggregated, as shown in
Figure 5b. The ITS set aggregation is shown in Equations (6)–(8).
where
represents the point set after set aggregation;
denotes the number of sets currently;
decides whether to merge the two sets
;
computes the Euclidean distance between two sets in the xoy plane;
is the highest point in the set
;
is the nearest point from the set
to other set
;
denotes the total number of points in the set
;
denotes the total number of points in the set
;
is the raster width (0.5 by default);
is the preset bandwidth (
by default).
After the above processing, there may be noisy point sets with a small number of points. These noisy point sets will be directly aggregated into the nearest sets.
5. Discussion
5.1. Comparison with Existing Methods
Multiple ITS methods are used in the four plots to test separately, and the test results are shown in
Table 7. The average
p,
r,
F, and
time(min) of the SHDR method [
12] in the four plots reached 0.84, 0.85, 0.84, and 23.3, respectively. The average
p,
r,
F, and
time(min) of the DK method [
10] in the four plots reached 0.66, 0.59, 0.62, and 12.5, respectively. The average
p,
r,
F, and
time(min) of the Improved DK method in the four plots reached 0.83, 0.86, 0.85, and 14.9, respectively. The average
p,
r,
F, and
time(min) of the HAIS method [
25] in the four plots reached 0.79, 0.72, 0.75, and 13.7, respectively. The average
p,
r,
F, and
time(min) of the proposed method in the four plots reached 0.91, 0.88, 0.90, and 13.5, respectively.
The spatial horizontal distance rule (SHDR) is a top-down method. This method has high segmentation accuracy, but it has high time complexity and is sensitive to the selection of the horizontal distance threshold parameter, which leads to poor universality.
DBSCAN with K-means (DK) is a bottom-up method. The advantage of this method is that it can segment the trees with obvious trunk characteristics well. However, the trunk points scanned by airborne laser lidar in forests with high canopy density are sparse, which leads to the inaccurate center points found by the DBSCAN algorithm, and then affects the final segmentation results.
HAIS is a deep learning method for instance segmentation. This method can achieve good results when segmenting indoor datasets. However, indoor objects are far apart from each other, and individual trees are close to each other. If HAIS is directly applied to ITS, it may cause serious over-segmentation and under-segmentation problems.
The Improved DBSCAN and K-means method (Improved DK) is an improved strategy that integrates extreme offset deep learning and the DK method. Experiments show that the introduction of extreme offset makes the segmentation accuracy of the DK method significantly improved in high-canopy-density forests.
As shown in
Figure 12a, the overall
p,
r, and
F of the proposed method in the four plots are higher than other comparison methods. As shown in
Figure 12b, the F-score of the proposed method is only slightly higher than the improved DK method in U1 with low canopy density. However, in the actual forest resource survey, the difficulty is usually the forest with high canopy density. Although the F-score of the proposed method is only slightly higher than the Improved DK method in U1, the F-score of the proposed method is much higher than the other comparison methods in other plots with high canopy density, which indicates the superiority of the proposed method. In addition, the average F-score of the Improved DK method (85%) is much higher than the average F-score of the DK method (68%). Although the proposed method consumes slightly more time (13.5 min total) than the DK method (12.5 min total) in the four plots, the segmentation accuracy of the proposed method is significantly higher than the DK method.
This indicates that the proposed method is effective, and the extreme offset deep learning can significantly improve the segmentation accuracy of the bottom-up method and has good extendibility in the high-canopy-density woodland of the ALS point cloud.
5.2. Analysis of Extreme Offset, Dynamic Bandwidth Strategy, and ITS Set Aggregation
Extreme offset analysis. Due to airborne laser lidar having poor penetration and cannot scan complete individual trees (especially in locations with severe canopy occlusion) [
8], at this time, the calculated individual tree center point has a huge gap with the actual individual tree center point, which seriously interferes with the learning effect of the deep network, resulting in the poor ability to enhance the discrimination among trees. Considering the treetop points are distinct and stable, an extreme offset loss function for ITS is designed based on HAIS, which makes the points offset to their respective treetops to enhance the discrimination among different trees. Then, the density of the treetop increases significantly after points are offset to their treetops.
Dynamic bandwidth strategy analysis. Due to the deep network randomness, the offset effect of some points is poor. Directly using the mean shift algorithm for clustering may suffer from serious under-segmentation and over-segmentation problems. Considering the treetop density increases significantly after extreme offset, the average proximity distance is adaptively generated as the bandwidth of the mean shift algorithm so that the dense points near the treetop are clustered into one class. At the same time, the isolated points with poor offset are clustered separately to eliminate the under-segmentation error as much as possible.
ITS set aggregation analysis. Due to the erroneous offset point, this inevitably causes an under-segmentation issue of the mean shift algorithm. The HAIS set aggregation approach merely aggregates sets that are close to one another, which may result in the improper aggregation of neighboring sets of different trees. In order to effectively prevent neighboring sets of various trees from being aggregated, the ITS set aggregation approach takes into account the gradient features of the individual trees. This method reduces over-segmentation mistakes and boosts segmentation accuracy.
5.3. Potential Improvements
In the future, we intend to improve the proposed method from the following aspects. (1) Improving deep networks. In order to better adapt to the ITS task, researchers can continue to improve the deep learning networks and make improvements to the existing ITS methods to improve the accuracy and robustness of the model so that the model may better handle the tree morphology with complex structures. (2) Application of leafless data. Using leafless data to extract trunk points and send them as features into deep networks for learning may further improve the accuracy and robustness of ITS tasks because leafless data can provide more information on the trunk structure and better reflect the morphological characteristics of trees. (3) Data Augmentation. A suitable data augmentation strategy may enhance the learning ability of deep networks and further improve the generalization ability and robustness of the model. Data augmentation can increase the diversity of the dataset, reduce overfitting, and improve the accuracy and robustness of the model by processing the data. In conclusion, by enhancing the deep network, applying leafless data, and data augmentation, which may better serve applications in tree-related fields, the accuracy and resilience of ITS tasks can be further enhanced.
6. Conclusions
In this paper, an ITS method based on extreme offset deep learning is designed for complex stand structures. The key steps are as follows: (1) preprocessing; (2) extreme offset; (3) self-adaptive clustering; (4) space mapping; (5) set aggregation; (6) postprocessing. In order to verify the universality and accuracy of the proposed method, coniferous forest plots in the Blue Ridge area of Washington, USA, and mixed forest plots near Bretten, Germany, are selected as the test plots. The point density is low, and the stand structure is relatively simple in the coniferous forest of the Blue Ridge in the USA. The point density is high, and the stand structure is complex in the mixed forest of Bretten in Germany. The test of the algorithm in these two areas can effectively verify the universality and accuracy of the algorithm. The experimental results show that after the introduction of step (2) extreme offset, it can effectively enhance the discrimination among different trees and then improve the ITS accuracy (the average p, r, and F reach 0.79, 0.72, and 0.75, respectively). After the introduction of step (2) and step (3) adaptive bandwidth strategy, it has better segmentation accuracy and can better adapt to complex scenes and changing environments (the average p, r, F reach 0.87, 0.85, 0.85, respectively). After the introduction of step (2) and step (5) ITS set aggregation, it can effectively reduce the over-segmentation error and improve the segmentation accuracy of the proposed method (the average p, r, F reach 0.91, 0.76, 0.83, respectively). After the introduction of steps (2), (3) and (5), the average p, r, and F of the proposed method in all plots reach 0.91, 0.88, and 0.90, respectively. In the future, we intend to improve the proposed method from the following aspects. (1) Improve the deep network to better adapt to the ITS task, (2) the tree trunk point coordinates may be extracted using the leafless data and fed into the deep network as features for learning, and (3) appropriate data augmentation methods may be used to enhance the learning ability of the network. In summary, the experimental results effectively verify the universality and accuracy of the proposed method and show the superiority and application potential in segmenting complex forest environments by comparing them with other algorithms.