Multi-Feature-Based Identification of Subtropical Evergreen Tree Species Using Gaofen-2 Imagery and Algorithm Comparison

Yuan, Jiayu; Wu, Zhiwei; Li, Shun; Kang, Ping; Zhu, Shihao

doi:10.3390/f14020292

Open AccessArticle

Multi-Feature-Based Identification of Subtropical Evergreen Tree Species Using Gaofen-2 Imagery and Algorithm Comparison

by

Jiayu Yuan

^1,2,3,

Zhiwei Wu

^1,2,3,*,

Shun Li

^1,2,3,

Ping Kang

^1,2,3 and

Shihao Zhu

^1,2,3

¹

Key Laboratory of Poyang Lake Wetland and Watershed Research, Ministry of Education, Jiangxi Normal University, Nanchang 330022, China

²

Key Laboratory of Natural Disaster Monitoring, Early Warning and Assessment of Jiangxi Province, Jiangxi Normal University, Nanchang 330022, China

³

School of Geography and Environment, Jiangxi Normal University, Nanchang 330022, China

^*

Author to whom correspondence should be addressed.

Forests 2023, 14(2), 292; https://doi.org/10.3390/f14020292

Submission received: 24 December 2022 / Revised: 27 January 2023 / Accepted: 30 January 2023 / Published: 2 February 2023

(This article belongs to the Section Forest Inventory, Modeling and Remote Sensing)

Download

Browse Figures

Versions Notes

Abstract

:

The species and distribution of trees in a forest are critical to the understanding of forest ecosystem processes and the development of forest management strategies. Subtropical forest landscapes feature a complex canopy structure and high stand density. Studies on the effects of classification algorithms on the remote sensing-based identification of tree species are few. GF-2 is the first satellite in China with sub-meter accuracy which has the high resolution and short replay cycle. Here, we considered three representative tree types (Masson pine, Chinese fir, and broadleaved evergreen trees) in the southern subtropical evergreen broadleaved forest region of China as research objects. We quantitatively compared the effects of five machine learning algorithms, including the backpropagation neural network, k-nearest neighbour, polytomous logistic regression, random forest (RF) and support vector machine (SVM), and four features (vegetation index, band reflectance, textural features, and topographic factors) on tree species identification using Gaofen-2 panchromatic and multispectral remote sensing images and field survey data. All five classification algorithms could effectively identify major tree species in subtropical forest areas (overall accuracy [OA] > 87.40%, kappa coefficient > 81.08%). The SVM model exhibited the best identification ability (OA = 90.27%, kappa coefficient = 85.37%), followed by RF (OA = 88.90%, Kappa coefficient = 83.30%). The combination of band reflectance, vegetation index, and the topographic factor performed exhibited the best, followed by the combination of band reflectance, vegetation index, textural feature, and topographic factor. In addition, we find that the classifier constructed by a single feature is not as effective as the combination of multiple feature factors. The addition of topographic factors can significantly improve the ability of tree species identification. According to the results of the five classifiers, the separability of the three tree species was good. The producer’s accuracy and user’s accuracy of Masson pine were more than 90%, and the evergreen broad-leaved tree and Chinese fir were more than 80%. The commission errors and omission errors of the three tree species were evergreen broadleaved tree > Chinese fir > Masson pine. The variable importance assessment results showed that the normalized difference greenness index, altitude, and the modified soil-adjusted vegetation index were the key variables. The results of this study used GF-2 to accurately identify the main tree species of subtropical evergreen forests in China, which can help forest managers to regularly monitor tree species composition and provide theoretical support for forest managers to formulate policies, monitor sustainable plans for wood mining, and forest conservation and management measures.

Keywords:

tree species identification; remote sensing identification; machine learning; multi-feature combination; subtropical forest

1. Introduction

Forests play an irreplaceable role in human survival and sustainable development [1]. They affect the productivity of terrestrial ecosystems, soil formation, nutrient cycling [2], and the ecology of the surrounding watersheds [3]. Therefore, elucidating the species and distribution of trees in forests is vital for forest ecosystem changes and forest management strategies. Tree species can be identified via field surveys and remote sensing. Traditional forest inventorying is time-consuming and cannot efficiently provide detailed spatial distribution information of forest trees over large areas. In contrast, remote sensing technology (e.g., satellite image data) has advantages, such as the large monitoring scale, fast information acquisition, short revisit period, and low operating costs, and it has become an important tool for forest species classification and forest resource survey and monitoring [4,5,6,7].

Subtropical forests cover a quarter of the total area, mainly distributed in central and southern China [8]. In the north subtropical evergreen broad-leaved and deciduous mixed forest belt, natural forest is not much; pine forest and artificial Chinese fir forest are common. The middle subtropical and south subtropical evergreen broad-leaved forests are the central distribution areas of Masson pine forest, Chinese fir forest, and evergreen broadleaved forest in China. The natural Masson pine forest accounts for about 50% of the forest area, and the Chinese fir forest accounts for 20%–30%. Mixed forests, dominated by evergreen broadleaved forests, account for 10%–20%. The subtropical areas of China are important production areas of timber forests, which is also the focus of ecological projects, such as soil and water conservation and biodiversity conservation in China [9,10]. Tree species classification based on remote sensing data is closely related to the development of digital image processing technology. Early research mainly adopted a pixel-based supervised and unsupervised classification, and the maximum likelihood method and k-nearest neighbour (KNN) method were widely used [11,12]. Förster et al. used QuickBird remote sensing images and object-oriented classification and recognition methods to effectively extract spruce, larch, and other tree types in forest areas in southern Germany [13]. With the progress of deep learning technology, various machine learning algorithms have provided a new paradigm for tree species (group) classification and recognition. Algorithms such as random forest (RF) and 4 vector machine (SVM) have certain adaptability to high-dimensional features [14,15,16], and are gradually replacing traditional classification algorithms in the research on forest tree species recognition. Based on Sentinel-2 and Landsat series images, Chen et al. used a random forest classifier to identify five forest types such as coniferous forest, broad-leaved forest, and mixed forest. The overall accuracy rate reached >85% [17].

Recent studies have increasingly adopted vegetation information extraction methods involving multiple features [18,19,20,21]. Topographic feature is a multidimensional variable; with the development of remote sensing technology, we can quickly obtain the terrain features of regional altitude, slope, aspect, and so on. Altitude, slope, and aspect are the main factors affecting the distribution of tree species. The effects of light, heat, water, and soil nutrients are different. Their changes will cause changes in these environmental factors, which will affect the distribution of plant communities [22]. In the study of remote sensing tree species identification, scholars try to add topographic factors to improve the identification accuracy. The integration of spectral features, textural features, and topographic factors obtained from remote sensing data can effectively improve tree species identification accuracy. By combining spectral features and textural features, topography, and other auxiliary aspects, Luo et al. [19] extracted mangroves from other land cover types and achieved a remarkably improved extraction accuracy. Kampouri et al. [23] combined information on topographic factors (e.g., altitude and slope) and expert knowledge to improve the accuracy of tree species classification in Sentinel-II images.

Machine learning models differ in data structure requirements and algorithms. Laurel Ballanti et al. [24] and Aneta Modzelewska et al. [25] used SVM algorithms based on hyperspectral data to identify tree species in forests near Marin County, California, and the Bialowieza Forest, respectively. They achieved overall classification accuracies of 95.02% and 70.00%, respectively. Previous studies have mainly focused on identifying temperate forests [26,27], which often have relatively simple tree species structures. Studies on the remote sensing-based identification of evergreen broadleaved forest species in subtropical regions are relatively few [28]. Subtropical areas are rich in forest resources and feature high densities, complex forest stands and mixed forests of different species, which can complicate remote sensing-based identification. Although some scholars have obtained high-precision classification results, they are often based on airborne data for which coverage area is limited [29,30,31]. There is a lack of research on using high-resolution satellite data to identify subtropical evergreen forest tree species. Furthermore, the effect of different kinds of machine learning on the identification of subtropical evergreen forests, and the question of which method and feature combinations are more suitable for the identification of subtropical evergreen forests still need to be made clear.

In this study, we focused on the typical subtropical evergreen forest area in Nankang District, Jiangxi Province, southern China. Masson pine, Chinese fir, and evergreen broad-leaved trees are widely distributed in the study area which is similar to the surrounding cities and counties. The overall goal of this study is to explore different algorithms and feature combination Schemes and their capabilities in subtropical tree species identification, and to find the most suitable remote sensing tree species identification method for subtropical evergreen forest areas. The main steps included (1) constructing different feature factor combination Schemes and comparing the effects of using different types of factors to identify subtropical evergreen tree species; (2) constructing machine learning classification algorithms, such as the nearest neighbor classification (KNN), support vector machine (SVM), BP neural network (BP), and random forest (RF) to explore the ability of different classification algorithms to identify subtropical evergreen tree species; and (3) evaluating the relative importance of variables and analyse the contribution rate of variables to the recognition model.

2. Materials and Methods

2.1. Study Area

The study area is located in Nankang, a district in Ganzhou, Jiangxi. Its geographical location is between 25°28′–26°14′24″ N and 114°29′9″–114°55′24″ E (Figure 1). The study area has a humid mid-subtropical monsoon climate with an average annual temperature of 19.3 °C, abundant rainfall, and an average annual precipitation of 1443.2 mm. The landscape is mostly low hills and plains, with elevations ranging from 71 to 916 m. Low hills are concentrated in the east, south, and north of the area, while the central area features a flat topography.

Nankang has a woodland area of 1091.66 km², accounting for 62.70% of the total land area. The main species are Masson pine (Pinus massoniana Lamb.), Chinese fir (Cunninghamia lanceolata (Lamb.) Hook.), Schima superba (Schima superba Gardn. et Champ.), and camphor tree (Cinnamomum camphora (L.) Presl). According to the rencent inventory data of the local forestry bureau of forest resources, Masson pine was the dominant species in this forest, approximately accounting for 99.78% of the forest area. In addition, Nankang is listed as an epidemic area of pine wood nematode disease; therefore, the identification of Masson pine is vital for local forest management.

2.2. Methods

2.2.1. Field Campaign

Field survey data were mainly used for selecting training and testing samples. In July 2022, we conducted a field survey on several tree species in Nankang, including Masson pine, Chinese fir, camphor tree, and Schima superba. The tree species, height, and diameter at breast height were recorded using a sub-metre precision GPS module that located single trees. Masson pine and Chinese fir are widely distributed in the study area. Some studies have shown that both species possess relatively similar spectral and textural features, which can easily lead to misclassification [32,33]. The mixed forests in the study area contain a wide variety of broadleaved evergreen trees, with other broadleaved species or evergreen coniferous species, and pure forests of a single tree species are few. A total of 489 samples were collected in the field survey, including 149 evergreen broadleaved samples, 186 Masson pine samples, and 154 Chinese fir samples (Table 1).

2.2.2. Remote Sensing Data and Preprocessing

GF-2 images were used as the remote sensing data in this study. The GF-2 images include one panchromatic band and four multispectral bands with spatial resolutions of 1 m and 4 m, respectively (Table 2), which are significantly higher than the GF-5 image (spatial resolution of 30 m). The images were obtained on 9 March 2022 with zero cloud cover. The data were obtained from the China Satellite Resources Application Centre (https://data.cresda.cn/#/home, accessed on 1 July 2022). GF-2 is China’s first remote sensing satellite with sub-metre spatial resolution and was launched in August 2014.The topographic data is a digital elevation model (DEM) and has 8 m spatial resolution, which was obtained from China’s first natural disaster risk census and used to extract the topographic factor.

We preprocessed GF-2 PMS imagery using ENVI 5.3. First, we performed the radiometric calibration and atmospheric correction of the GF-2 image using FLAASH, to convert the original digital number values into radiation values and reflectance. Second, we performed the geometric correction of PMS bands, and the root-mean-square error was strictly controlled within one pixel. Pan-sharpened 1 m resolution GF-2 images were created through the fusion of the 4 m multispectral GF-2 imagery with the 1 m panchromatic GF-2 imagery via nearest-neighbour diffusion. We cute the DEM based on the vector map of Nankang District.

2.2.3. Feature Extraction and Screening

Feature extraction

The spectral reflectance features of four bands were extracted, namely the red, green, blue, and near-infrared bands of GF-2. Thirteen vegetation indices were calculated, and the band reflectance represented the vegetation characteristics. According to the fused GF-2 data with 1 m spatial resolution, 32 textural features were extracted from four bands using a grey-level co-occurrence matrix. In the environment of Arcgis, we used the spatial analyst tool to extract altitude, slope, and aspect respectively (Table 3).

Table 3. Description of features.

Factors	Abbreviation	Description	Feature Number
Spectral feature	B1	Reflectivity of blue band	4
	B2	Reflectivity of green band
	B3	Reflectivity of red band
	B4	Reflectivity of near-infrared band
Vegetation indices	Shown in Table 4		13
Textural feature	Mean 1–4	Textural features are derived from red, green, blue, and near-infrared bands by using gray-level co-occurrence matrix (GLCM).	32
	Variance 1–4
	Homogeneity 1–4
	Contrast 1–4
	Dissimilarity 1–4
	Entropy 1–4
	Angular Second Moment 1–4
	Correlation 1–4
Topographic factor	Altitude	Topographic features are calculated from the DEM.	3
	Aspect
	Slope

2.: Feature Screening

Up to 52 feature factors of spectral features, textural features, and topographic factor were extracted. A high correlation may exist between the characteristic factors. The high correlation between explanatory variables may distort model estimation or interfere with simulation estimation accuracy, and evaluating the relative importance of model features is difficult. The variance inflation factor (VIF) is the reciprocal of tolerance, which is the ratio of the variance with multicollinearity to the variance without multicollinearity. The larger the VIF, the stronger the linear correlation between explanatory variables and dependent variables. [46]. We used the rule of thumb that when the VIF > 10 [47,48], then the given explanatory variables would be problematic, and they should be removed. Eighteen out of thirty-two textural features were extracted, four out of thirteen vegetation index features were extracted, and three of the four band reflectance features were extracted (Table 5).

2.2.4. Classification Scheme

Eleven classification Schemes were constructed according to spectral features, textural features, vegetation index features, and topographic factors (Table 6): single feature classification Schemes (Schemes 1, 2 and 3), multi-feature combination Schemes excluding topographic factors (Schemes 4, 5, 6, 10), and multi-feature combination Schemes including topographic factors (7, 8, 9, 11). By comparing Schemes 1, 2 and 3, we can find the single factor with the best recognition effect. By comparing the three single-factor Schemes with other multi-features combination Schemes, the recognition ability of single-factor and multi-feature combinations can be compared. By comparing Schemes 4 and 7, 5 and 8, 6 and 9, 10 and 11, the influence of topographic factors on the recognition effect of the model can be explored.

2.2.5. Classification

Seventy per cent of the sample point data were randomly selected as training samples, and the remaining 30% of the sample point data were test data based on the “caret” package in R-project. Band reflectance, vegetation index features, textural features, and topographic factors were used as predictor variables to construct five classification models: RF, KNN, SVM, PLR, and BP. To reduce the likelihood of incorrect species identification, five random replicates of the sample point data were sampled.

RF is a combined classifier based on the statistical learning theory. The algorithm extracts multiple samples from the original sample via the bootstrap resampling method, performs decision tree modelling on each bootstrap sample, and then combines multiple decision tree predictions. The final prediction results are obtained via voting [49]. The RF algorithm outperforms single classifiers, features high prediction accuracy and good tolerance to outliers and noise, and is not prone to overfitting [50]. We used the ‘Random Forest’ package in R-project to build an RF classification model and to tune the parameters for the number of trees (ntrees) and tree nodes (mtry). The ntrees and mtry parameters in RF were obtained using the out-of-bag error (OOB score) in relation to the variation in the number of ntrees.

KNN is a typical nonparametric algorithm. The core idea is to perform univariate or multivariate estimation according to the spatial similarity between the estimated point and the known point in the multidimensional feature space [51]. We used the ‘kknn’ package in R-project to construct the KNN classifier. The Euclidean distance was selected using the KNN distance metric, and the number of neighbours was optimised.

SVM is a classifier developed according to statistical theory and based on the marginal maximisation principle. The coefficients of each feature are used as weights, and the feature variables with the smallest score are sequentially deleted. The SVM algorithm performs iterations until all features are removed, and finally selects the feature variables corresponding to the best combination, according to the ranking [52]. SVM shows numerous unique advantages in solving small-sample, nonlinear, and high-dimensional pattern recognition problems and overcomes the ‘dimension disaster’ problem [53]. We constructed the SVM using the ‘e1071’ package in R-project and used the radial basis kernel function to optimise the parameters of the penalty coefficient (cost) and the kernel function coefficient (gamma) [54].

The BP neural network is a multilayer feedforward neural network that enables nonlinear mapping from input to output [55]. When the input layer does not meet the desired output, the weights and thresholds of each neuron will be adjusted repeatedly along the negative gradient direction of the error until the error meets the requirements. Under a large number of samples, batch processing can ensure a decrease in the total error, so that the algorithm converges faster than in separate processing [56]. We constructed the BP classifier using the ‘nnet’ package in R-project and optimised the number of hidden layer nodes (size).

Polytomous logistic regression (PLR), also referred to as multinomial logistic regression, is used to classify remote sensing images and can be used to predict the probability that an image element in an image belongs to a certain land cover type [57]. Some of the advantages of PLR include probabilistic outputs, fewer restrictive assumptions, the ability to use continuous and categorical explanatory variables, suitability for hypothesis testing, simple model interpretation, and low modelling errors. We used the ‘lattice’ package in R-project to construct the PLR classifier.

2.2.6. Accuracy Assessment

The verification samples were selected from the reference dataset using the stratified random sampling method. The accuracy assessment was individually performed for each classification result. The classification accuracy was expressed in terms of the overall accuracy (OA), kappa coefficient (Kappa), the user’s and producer’s accuracy (PA, UA), and commission errors and omission errors. The effects of different classification algorithms and classification Schemes on tree species identification accuracy were analysed in terms of the OA of each classification Scheme, the optimal accuracy of each classification algorithm, and the accuracy for each forest type.

2.2.7. Assessment of the Relative Importance of Variables

The importance of variables was assessed using the mean decrease Gini (MDG). MDG is the sum of all decreases in Gini impurity due to a given variable (when this variable is used to form a split in the Random Forest), normalized by the number of trees [58]. It compares the importance of variables by calculating the influence of each variable on the heterogeneity of observations on each node of the classification tree. The larger the MDG, the greater the importance of the variable [49].

3. Results

3.1. OA Evaluation of Each Classification Scheme

As shown in Figure 2, SVM exhibited the highest classification accuracy, with an OA of 90.27% when using Scheme 8 (band reflectance + vegetation index + topographic factor). The SVM classifier exhibited the highest OA when all of the Schemes except Scheme 11 (band reflectance + vegetation index + textural features + topographic factor) were used; when Scheme 11 was used, the OA was lower than RF by 0.31%. Therefore, SVM can be considered the optimal classification algorithm for tree species identification.

A comparison of Schemes 1–3 revealed that for tree species identification using a single feature factor, Scheme 3 (textural feature) achieved the highest accuracy on PLR, SVM, RF, and KNN, while for the artificial neural network (BP), the OA was 6.07% lower than that for Scheme 2 (vegetation index). Scheme 2 (vegetation index) achieved the next-highest accuracy, while Scheme 1 (band reflectance) exhibited the worst performance. When a combination of multiple feature factors was used for tree species identification, Scheme 8 (band reflectance + vegetation index + topographic factor) yielded the highest classification accuracy on BP, PLR, SVM, and KNN which higher than other 11 Schemes. For RF, the OA of Scheme 8 was only 1.54% lower than that of Scheme 11 (band reflectance + vegetation index + textural features + topographic factor). In addition, the SVM classifier constructed using Scheme 8 yielded the highest classification accuracy and Kappa coefficient. Thus, Scheme 8 can be regarded as the optimal feature combination for tree species identification.

A comparison of the combinations of Schemes 4 and 7, 5 and 8, 6 and 9, and 10 and 11 showed that the topographic factors contributed the following OA: 12.66%–24.36%, 17.15%–25.34%, 9.52%–17.16%, and −2.88%–15.28%, respectively. The overall classification accuracies of BP and PLR under Scheme 11 were 0.41% and 2.88% lower than that under Scheme 10. The addition of topographic factors improved the overall classification accuracy.

3.2. Assessment of the Optimal Accuracy of Each Classification Algorithm

Five classification algorithms, namely BP, PLR, SVM, RF, and KNN, were considered in this study. Their parameters in R-project were set as shown in Table 7.

To more intuitively compare the classification abilities of the algorithms, their mean highest accuracy results were compared (Figure 3). The OA and kappa coefficient decreased as follows: SVM > RF > BP > PLR > KNN.

Generally, the five adopted classification methods exhibited good performance in the identification of subtropical evergreen forests, and the OA and the kappa coefficient exceeded 87% and 81%, respectively. SVM yielded the highest classification accuracy, with the highest OA reaching 90.27%, corresponding to the highest kappa coefficient of 85.37%; RF yielded the second-highest classification accuracy value of 88.90%, corresponding to a kappa coefficient of 83.30%; the OA and kappa coefficient were lower than those of SVM by 1.52% and 2.42%, respectively. KNN yielded the lowest OA and kappa coefficient of 87.40% and 81.08%, respectively, which were lower than those of SVM by 3.18% and 5.03%, respectively.

3.3. Assessment of Tree Species Classification Accuracy

From the results of the identification of the three tree species, the user’s accuracy and producer’s accuracy for Masson pine exceeded 90%, while those for the broadleaved evergreen and fir trees exceeded 80%. The highest producer’s accuracy and user’s accuracy for evergreen broadleaved tree identification were 88.17% (SVM) and 84.66% (SVM), respectively. The highest producer’s accuracy and user’s accuracy of Masson pine identification were 100% (SVM) and 0.9322 (RF), respectively. The highest producer’s accuracies for Chinese fir identification were 88.09% (BP) and 94.67% (BP), respectively (Table 8).

In order to reflect the identification effect of three tree species from different angles, we use commission errors and omission errors to evaluate. The omission errors refer to the number of samples that are not assigned to the real label in a certain category. The commission errors are the number of misclassified samples in a certain category for samples that do not belong to this category but are assigned to this category [59]. The commission errors of a certain category are high, and the omission errors are low, indicating that other feature categories are misclassified into the target feature category, resulting in an exaggerated number of target feature categories identified. On the contrary, the commission errors of a certain category are low, and the omission errors are high, indicating that the ground object category is omitted, resulting in a smaller number of identified target ground object categories. The commission errors and omission errors for the three species analysed at the image metric scale generally decreased as follows: broadleaved evergreen > Chinese fir > Masson pine (Figure 4). Broadleaved evergreen trees showed comparable omission errors (0.1183–01957) and commission errors (0.1534–0.1673). Masson pine showed small omission errors (0–0.0921) and commission errors (0.0678–0.0920), all below 0.1. For Chinese fir, omission errors (0.1189–0.1745) were dominant compared with the commission errors (0.0914–0.1218).

3.4. Relative Importance of Variables

When KNN, SVM, BP, and the PLR model were used, the highest overall classification accuracy was obtained under Scheme 8 (band reflectance + vegetation index + topographic factor). When the RF model was used, the highest overall classification accuracy was obtained under Scheme 11 (band reflectance + vegetation index + textural feature + topographic factor). Therefore, we used the RF model to generate the relative importance of the feature variables under Schemes 11 and 8 and obtained the importance parameter MDG (Figure 5a,b), which was used to explore the relative importance of each feature factor.

The results without textural features (Scheme 8) showed that the four variables with high relative importance assessed by the RF model were the normalized difference green index (NDGI), altitude, the modified soil-adjusted vegetation index (MSAVI), and difference vegetation index (DVI), with NDGI having the highest relative importance (67.96), followed by altitude (65.28), and then MSAVI (23.94). After the addition of textural features (Scheme 11), the four key variables were altitude, NDGI, Contrast4, and MSAVI, with importance values of 55.84, 34.88, 31.63, and 17.47, respectively. According to the combined results of the variable importance evaluation for both classification Schemes, NDGI, altitude, and MSAVI are the most important variables.

4. Discussion

The study showed that when a single feature factor was used for tree species identification, the textural index (Scheme 3) performed best, followed by the vegetation index (Scheme 2), and the band reflectance exhibited the worst performance (Scheme 1). It suggests that accurately identifying tree species only by band reflectance is difficult, possibly because evergreen species exhibit relatively similar spectral reflectance, and relying solely on band reflectance often results in homospectral or heterospectral phenomena [60]. Wang et al. [61] reached similar conclusions after classifying tree species with insignificant differences in spectral curves, such as mixed broadleaved, mixed conifer and mixed conifer forests. This suggests that other information, such as textural or topographic information, is needed to distinguish the species with similar spectral curves.

Furthermore, among the five machine learning algorithms employed, the accuracy of tree species identification based on a single feature factor was lower than that based on multiple features. This suggests that the use of single type features for tree species remote sensing identification is significantly limited, particularly for subtropical broadleaf evergreen forest areas with high biodiversity and high vegetation cover. However, the combination of multi-feature variables can fully leverage the features information of ground objects to increase the discrimination between different tree species [62]. Wang et al. [63] integrated the tree species’ vegetation indices, phenological information, textural features, and topographic features in the construction of a multi-feature random forest tree species classification model and found that the fusion of multiple features could effectively improve the recognition accuracy of tree species.

A comparison of the classification accuracy with and without topographic factors revealed that the inclusion of topographic features significantly contributed to the OA, mainly because the tree distribution is closely related to topographic factors [64]. The main landform types in the study area are hills and mountains, which are undulating. The three tree species feature different altitudes, slopes, and aspects. Chinese fir is located on a higher slope than Masson pine and on a lower slope than evergreen broadleaved trees. Masson pine and Chinese fir are sun-loving species, and broadleaved evergreens generally prefer shade, which leads to a difference in their slope orientation. The importance of topographic factors in tree species identification is also reflected in the findings of several studies. Wang et al. [63] found that topographic features play a crucial role in the classification of random forest tree species based on the fusion of different features, with altitude being the best feature factor. Li et al. [65] found that the inclusion of topographic factors improved the identification accuracy of coniferous forests when tree species were classified using GF-2 PMS data. Several studies have shown that the contribution of topographic factors to recognition accuracy depends on the topographic difference. Li et al. [66] studied the Huangfu shan National Forest Park and found that topographic features had little effect on tree species identification accuracy, and even reduced the OA, owing to the few peaks in the study area and the average low altitude. The highly accessible flat areas featured high tending and renewal intensities, and the slope and aspect distribution of tree species were affected to some extent. AHoscilo et al. [67] used Sentinel-2 data to classify eight tree species and found that the classification accuracy increased from 75.60% to 81.70% after adding topographic features, and the elevation had the greatest impact on tree species classification, followed by slope. Zhang et al. [68] found that the spatial resolution and accuracy of DEM also affected classification accuracy. The spatial resolution of DEM used in the present study was 8 m, which was generally higher than the DEM accuracy used in other studies, which may be the reason for the important role of topographic factors.

In the present study, the five algorithms exhibited good recognition accuracies, and the OA exceeded 87%. SVM exhibited the best, followed by RF. Other studies have also found that SVM and RF exhibited high accuracy in forest and land cover classification [69]. SVM exhibited the best classification effect in terms of OA and kappa coefficient, which indicates that the SVM classifier has high generalisability and only requires a small amount of training sample data. The SVM classifier shows many unique advantages in solving the problem of small-sample and high-dimensional pattern remote sensing recognition [70,71], particularly suitable for the classification of remote sensing images. It is generally accepted that the SVM classifier can effectively process limited training samples [72], because the system randomly generates a hyperplane and moves continuously, to establish an optimal decision hyperplane and classify the samples. RF outperformed the other algorithms when the highest number of variables was used (Scheme 11), attributable to the improved ability of the RF algorithm to process high-dimensional, massive variables in parallel compared with other machine learning algorithms [73]. Li et al. [65] conducted a comparison experiment using GF-2 PMS data combined with spectral features, textural features, the vegetation index, and topographic factors. The experimental results showed that the OA and kappa coefficients of the RF classifier were higher than those of the SVM classifier.

The omission and commission errors of the evergreen broadleaved trees were higher than those of Chinese fir and Masson pine, which is consistent with the conclusion of Zhang. Zhang et al. [74] studied the remote sensing-based identification of subtropical evergreen trees and found that the omission and commission errors of both Chinese fir and Masson pine were lower than those of evergreen broadleaved forests. In the present study, the classification errors of broadleaved trees were relatively high, possibly because the study area comprised typical subtropical trees, including a wide variety of broadleaved species, and few pure broadleaved forest areas, which tend to produce mixed pixels.

The results of the variable importance assessment showed that the topographic factor was an important model variable. The consideration of topographic factor can improve tree species identification accuracy, especially altitude. Rautiainen et al. [75] indicated that topographic factors and the light preference of tree species were closely related to the spectral differences of conifers and showed that the addition of elevation and aspect factors improved the separability of fir and Masson pine. In addition, in the present study, the vegetation index played an important role in tree species identification, with NDGI having the highest relative importance, and it could be used as a classification variable to distinguish evergreen trees. This was attributed to NDGI being able to reflect the chlorophyll content, the biomass of trees, and the water content of leaves, while the three evergreen trees are different in these aspects.

5. Conclusions

Mastering the distribution of tree species is the basis for forest resource management. Remote sensing technology is an important method for tree species identification. The combination of high-resolution satellite and machine learning technology provides the possibility for tree species identification at the regional scale. China’s forests are widely distributed, and subtropical evergreen forests are large in area, and they are the main forest types in southern China. However, due to their high stand density, it is difficult to identify them. Most studies focused on tree species identification in temperate regions, and relatively few in subtropical regions. The use of UAV data in subtropical tree species identification is often more than the use of satellite data. The effect of different feature combinations and different machine learning methods on the identification of subtropical evergreen tree species needs to be explored.

The purpose of this study is to compare the effects of different feature combination Schemes and machine learning algorithms in the identification of subtropical evergreen tree species, and to construct the classification of subtropical evergreen tree species with the best recognition effect. In this study, evergreen tree species in subtropical areas were identified via remote sensing. The results provide support information for the recognition of subtropical evergreen tree species using GF-2 imagery. The main conclusions are as follows:

(1): The combination of Scheme 8 and SVM can produce the best recognition effect of subtropical evergreen forest tree species, with an overall accuracy of 90.27% and a Kappa coefficient of 85.37%.
(2): BP, SVM, RF, KNN, and PLR classifiers were constructed through the multi-feature combination method. The best OA and kappa coefficient obtained by each classification algorithm exceeded 87% and 0.81, respectively (SVM > RF > BP > PLR > KNN); thus, the classification results met the application requirements of tree species identification and extraction of subtropical natural evergreen forests under a complex canopy structure and high stand density.
(3): Achieving accurate tree species recognition using a single class of feature factors was difficult. The combination of multiple features yielded a higher classification accuracy, and the addition of topographic factors effectively improved the tree species recognition accuracy.
(4): Band reflectance, vegetation index feature, textural features, and topographic factor extracted from GF-2 data were combined into different Schemes. Scheme 8 (band reflectance + vegetation index + topographic factor) yielded the best effect, followed by Scheme 11 (band reflectance + vegetation index + textural feature + topographic factor).
(5): The recognition effect of three evergreen tree species was evaluated based on commission errors and omission errors. We found that among the five models, Masson pine had the best recognition effect, followed by Chinese fir.
(6): Different variables had different importance values in tree species identification. NDGI, altitude, and MSAVI were the most relevant variables.

Author Contributions

Methodology, data curation, writing-original draft, J.Y.; Conceptualization, methodology, project administration, funding acquisition, Z.W.; writing—review and editing, S.L.; investigation, P.K. and S.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number is 31960253.

Data Availability Statement

Since the sample points dataset are the part of the author’ graduation paper, they are not publicly available at the moment but are available from the corresponding author on reasonable request. The other data can be obtained from the means provided in the text.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

BP	Backpropagation neural network
KNN	K-nearest neighbour
PLR	Polytomous logistic regression
RF	Random forest
OA	Overall accuracy
Kappa	Kappa coefficient
GF-2	Gaofen-2
B1	Reflectivity of blue band
B2	Reflectivity of green band
B3	Reflectivity of red band
B4	Reflectivity of near-infrared band
BAI	Burn area index
DVI	Difference vegetation index
EVI	Enhanced vegetation index
CI_green	Green chlorophyll index
GEMI	Global environment monitoring index
SVI	Shaded vegetation index
GNDVI	Green normalized difference vegetation index
MSAVI	Modified soil adjusted vegetation index
NDVI	Normalized vegetation index
NDGI	Normalized difference greenness index
OSAVI	Optimization soil-adjusted vegetation index
RVI	Ratio vegetation index
SAVI	Soil-adjusted vegetation index
PA	Producer’s accuracy
UA	User’s accuracy
VIF	Variance inflation factor
MDG	Mean decrease Gini

References

Mori, A.S.; Lertzman, K.P.; Gustafsson, L. Biodiversity and ecosystem services in forest ecosystems: A research agenda for applied forest ecology. J. Appl. Ecol. 2017, 54, 12–27. [Google Scholar] [CrossRef]
Ali, A.; Wang, L.Q. Big-sized trees and forest functioning: Current knowledge and future perspectives. Ecol. Indic. 2021, 2021, 107760. [Google Scholar] [CrossRef]
Valjarević, A.; Djekić, T.; Stevanović, V.; Ivanović, R.; Jandziković, B. GIS numerical and remote sensing analyses of forest changes in the Toplica region for the period of 1953–2013. Appl. Geogr. 2018, 92, 131–139. [Google Scholar] [CrossRef]
Stewart, J.B.; Finch, J.W. Application of remote sensing to forest hydrology. J. Hydrol. 1993, 150, 701–716. [Google Scholar] [CrossRef]
Cilek, A.; Berberoglu, S.; Donmez, C.; Sahingoz, M. The use of regression tree method for Sentinel-2 satellite data to mapping percent tree cover in different forest types. Environ. Sci. Pollut. Res. 2022, 29, 23665–23676. [Google Scholar] [CrossRef]
Becker, A.; Russo, S.; Puliti, S.; Lang, N.; Schindler, K.; Wegner, J.D. Country-wide retrieval of forest structure from optical and SAR satellite imagery with deep ensembles. ISPRS-J. Photogramm. Remote Sens. 2023, 195, 269–286. [Google Scholar] [CrossRef]
Astorga, A.; Moreno, P.C.; Reid, B. Watersheds and Trees Fall Together: An Analysis of Intact Forested Watersheds in Southern Patagonia (41-56 degrees S). Forests 2018, 9, 385. [Google Scholar] [CrossRef]
Zhang, Y.; Li, X.; Kong, Z.; Du, N.; Wu, M. Subtropical forest vegetation development and climate change in Baishanzu area of Zhejiang Province, China, since the Holocene. Palaeogeogr. Palaeoclimatol. Palaeoecol. 2022, 608, 111293. [Google Scholar] [CrossRef]
Song, Z.; Seitz, S.; Zhu, P.; Goebes, P.; Shi, X.; Xu, S.; Wang, M.; Schmidt, K.; Scholten, T. Spatial distribution of LAI and its relationship with throughfall kinetic energy of common tree species in a Chinese subtropical forest plantation. For. Ecol. Manag. 2018, 425, 189–195. [Google Scholar] [CrossRef]
Yao, X.; Yu, K.; Deng, Y.; Zeng, Q.; Lai, Z.; Liu, J. Spatial distribution of soil organic carbon stocks in Masson pine (Pinus massoniana) forests in subtropical China. Catena 2019, 178, 189–198. [Google Scholar] [CrossRef]
Carleer, A.; Wolff, E. Exploitation of very high resolution satellite data for tree species identification. Photogramm. Eng. Remote Sens. 2004, 70, 135–140. [Google Scholar] [CrossRef]
Feret, J.; Asner, G.P. Tree Species Discrimination in Tropical Forests Using Airborne Imaging Spectroscopy. IEEE Trans. Geosci. Remote Sens. 2013, 51, 73–84. [Google Scholar] [CrossRef]
Förster, M.; Kleinschmit, B. Object-based classification of QuickBird data using ancillary information for the detection of forest types and NATURA 2000 habitats. In Object-Based Image Analysis: Spatial Concepts for Knowledge-Driven Remote Sensing Applications; Blaschke, T., Lang, S., Hay, G.J., Eds.; Springer: Berlin/Heidelberg, Germany, 2008; pp. 275–290. [Google Scholar] [CrossRef]
Buhvald, A.P.; Racic, M.; Immitzer, M.; Ostir, K.; Veljanovski, T. Grassland Use Intensity Classification Using Intra-Annual Sentinel-1 and-2 Time Series and Environmental Variables. Remote Sens. 2022, 14, 3387. [Google Scholar] [CrossRef]
Liang, S.; Gong, Z.; Zhao, W.; Guan, H.; Liang, Y.; Lu, L.; Zhao, X. Information Extraction of Baiyangdian Wetland based on Multi-season Sentinel-2 Images. Remote Sens. Technol. Appl. 2021, 36, 777–790. [Google Scholar]
Ghimire, B.R.; Nagai, M.; Tripathi, N.K.; Witayangkurn, A.; Mishara, B.; Sasaki, N. Mapping of Shorea robusta Forest Using Time Series MODIS Data. Forests 2017, 8, 384. [Google Scholar] [CrossRef]
Cheng, K.; Wang, J.; Yan, X. Mapping Forest Types in China with 10 m Resolution Based on Spectral-Spatial-Temporal Features. Remote Sens. 2021, 13, 973. [Google Scholar] [CrossRef]
Fei, H.; Fan, Z.; Wang, C.; Zhang, N.; Wang, T.; Chen, R.; Bai, T. Cotton Classification Method at the County Scale Based on Multi-Features and Random Forest Feature Selection Algorithm and Classifier. Remote Sens. 2022, 14, 829. [Google Scholar] [CrossRef]
Luo, Y.; Ouyang, Y.; Zhang, R.; Feng, H. Multi-Feature Joint Sparse Model for the Classification of Mangrove Remote Sensing Images. ISPRS Int. J. Geo-Inf. 2017, 6, 177. [Google Scholar] [CrossRef]
Zhu, J.; Pan, Z.; Wang, H.; Huang, P.; Sun, J.; Qin, F.; Liu, Z. An Improved Multi-temporal and Multi-feature Tea Plantation Identification Method Using Sentinel-2 Imagery. Sensors 2019, 19, 2087. [Google Scholar] [CrossRef] [PubMed]
Wang, X.; Qin, L.; Luo, L.; Zhang, X.; Tang, G. Multi-feature selection in remote sensing forest species classification with SVM. Comput. Eng. Appl. 2013, 49, 259–262. [Google Scholar]
Dyderski, M.K.; Pawlik, L. Spatial distribution of tree species in mountain national parks depends on geomorphology and climate. For. Ecol. Manag. 2020, 474, 118366. [Google Scholar] [CrossRef]
Kampouri, M.; Kolokoussis, P.; Argialas, D.; Karathanassi, V. Mapping of forest tree distribution and estimation of forest biodiversity using Sentinel-2 imagery in the University Research Forest Taxiarchis in Chalkidiki, Greece. Geocarto Int. 2019, 34, 1273–1285. [Google Scholar] [CrossRef]
Ballanti, L.; Blesius, L.; Hines, E.; Kruse, B. Tree Species Classification Using Hyperspectral Imagery: A Comparison of Two Classifiers. Remote Sens. 2016, 8, 445. [Google Scholar] [CrossRef]
Modzelewska, A.; Fassnacht, F.E.; Sterenczak, K. Tree species identification within an extensive forest area with diverse management regimes using airborne hyperspectral data. Int. J. Appl. Earth Obs. Geoinf. 2020, 84, 101960. [Google Scholar] [CrossRef]
Richter, R.; Reu, B.; Wirth, C.; Doktor, D.; Vohland, M. The use of airborne hyperspectral data for tree species classification in a species-rich Central European forest area. Int. J. Appl. Earth Obs. Geoinf. 2016, 52, 464–474. [Google Scholar] [CrossRef]
Fassnacht, F.E.; Latifi, H.; Stereńczak, K.; Modzelewska, A.; Lefsky, M.; Waser, L.T.; Straub, C.; Ghosh, A. Review of studies on tree species classification from remotely sensed data. Remote Sens. Environ. 2016, 186, 64–87. [Google Scholar] [CrossRef]
Gyamfi-Ampadu, E.; Gebreslasie, M. Two Decades Progress on the Application of Remote Sensing for Monitoring Tropical and Sub-Tropical Natural Forests: A Review. Forests 2021, 12, 739. [Google Scholar] [CrossRef]
Jia, W.; Pang, Y.; Meng, S.; Ju, H.; Li, Z. Tree Species Classification Using Airborne Hyperspectral Data in Subtropical Mountainous Forest. In Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China, 10–15 July 2016; pp. 2284–2287. [Google Scholar] [CrossRef]
Wu, Y.; Zhang, X. Object-Based Tree Species Classification Using Airborne Hyperspectral Images and LiDAR Data. Forests 2020, 11, 32. [Google Scholar] [CrossRef]
Qin, H.; Zhou, W.; Yao, Y.; Wang, W. Individual tree segmentation and tree species classification in subtropical broadleaf forests using UAV-based LiDAR, hyperspectral, and ultrahigh-resolution RGB data. Remote Sens. Environ. 2022, 280, 113143. [Google Scholar] [CrossRef]
Cai, L.F.; Wu, D.S.; Fang, L.M.; Zhen, X.Y. Tree Species Identification Using XGBoost Based on GF-2 Images. For. Resour. Manag. 2019, 5, 44–51. [Google Scholar] [CrossRef]
Tian, T.; Fan, W.Y.; Lu, W.; Xiao, X. An object-based information extraction technology for dominant tree species group types. Chin. J. Appl. Ecol. 2015, 26, 1665–1672. [Google Scholar] [CrossRef]
Chuvieco, E.; Martin, M.P.; Palacios, A. Assessment of different spectral indices in the red-near-infrared spectral domain for burned land discrimination. Int. J. Remote Sens. 2002, 23, 5103–5110. [Google Scholar] [CrossRef]
Jordan, C.F. Derivation of Leaf-Area Index from Quality of Light on the Forest Floor. Ecology 1969, 50, 663–666. [Google Scholar] [CrossRef]
Justice, C.O.; Vermote, E.; Townshend, J.; Defries, R.; Roy, D.P.; Hall, D.K.; Salomonson, V.V.; Privette, J.L.; Riggs, G.; Strahler, A. The Moderate Resolution Imaging Spectroradiometer (MODIS): Land remote sensing for global change research. IEEE Trans. Geoence Remote Sens. 1998, 36, 1228–1249. [Google Scholar] [CrossRef]
Gitelson, A.A.; Vina, A.; Ciganda, V.; Rundquist, D.C.; Arkebauer, T.J. Remote estimation of canopy chlorophyll content in crops. Geophys. Res. Lett. 2005, 32, 8. [Google Scholar] [CrossRef]
Pinty, B.; Verstraete, M.M. GEMI: A non-linear index to monitor global vegetation from satellites. Vegetatio 1992, 101, 15–20. [Google Scholar] [CrossRef]
Xu, Z.H.; Liu, J.; Yu, K.Y.; Liu, T.; Gong, C.H.; Tang, M.Y.; Xie, W.J.; Li, Z.L. Construction of Vegetation Shadow Index (SVI) and Application Effects in Four Remote Sensing Images. Spectrosc. Spectr. Anal. 2013, 33, 3359–3365. [Google Scholar] [CrossRef]
Gitelson, A.A.; Merzlyak, M.N. Signature Analysis of Leaf Reflectance Spectra: Algorithm Development for Remote Sensing of Chlorophyll. J. Plant Physiol. 1996, 148, 494–500. [Google Scholar] [CrossRef]
Qi, J.G.; Chehbouni, A.R.; Huete, A.R.; Kerr, Y.H.; Sorooshian, S. A Modified Soil Adjusted Vegetation Index. Remote Sens. Environ. 1994, 48, 119–126. [Google Scholar] [CrossRef]
Tucker, C.J. Red and photographic infrared linear combinations for monitoring vegetation. Remote Sens. Environ. 1979, 8, 127–150. [Google Scholar] [CrossRef] [Green Version]
Meyer, G.E.; Xe, J.; Neto, O.C. Verification of color vegetation indices for automated crop imaging applications. Comput. Electron. Agric. 2008, 68, 282–293. [Google Scholar] [CrossRef]
Baret, G.R.A.M. Optimization of soil-adjusted vegetation indices. Remote Sens. Environ. 1996, 55, 95–107. [Google Scholar] [CrossRef]
Huete, R.A. A soil-adjusted vegetation index (SAVI). Remote Sens. Environ. 1988, 25, 295–309. [Google Scholar] [CrossRef]
O′Brien, R.M. A caution regarding rules of thumb for variance inflation factors. Qual. Quant. 2007, 41, 673–690. [Google Scholar] [CrossRef]
Eckert, S. Improved Forest Biomass and Carbon Estimations Using Texture Measures from WorldView-2 Satellite Data. Remote Sens. 2012, 4, 810–829. [Google Scholar] [CrossRef]
Sarker, L.R.; Nichol, J.E. Improved forest biomass estimates using ALOS AVNIR-2 texture indices. Remote Sens. Environ. 2011, 115, 968–977. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Wurm, M.; Taubenboeck, H.; Weigand, M.; Schmitt, A. Slum mapping in polarimetric SAR data using spatial features. Remote Sens. Environ. 2017, 194, 190–204. [Google Scholar] [CrossRef]
Pei, H.; Sun, T.J.; Wang, X.Y. Object-oriented land use/cover classification based on texture features of Landsat 8 OLI image. Trans. Chin. Soc. Agric. Eng. 2018, 34, 248–255. [Google Scholar]
Shen, L.; Chen, H.; Yu, Z.; Kang, W.; Zhang, B.; Li, H.; Yang, B.; Liu, D. Evolving support vector machines using fruit fly optimization for medical data classification. Knowl.-Based Syst. 2016, 96, 61–75. [Google Scholar] [CrossRef]
Ding, S.F.; Qi, B.J.; Tang, H.Y. An Overview on Theory and Algorithm of Support Vector Machines. J. Univ. Electron. Sci. Technol. China 2011, 40, 2–10. [Google Scholar]
Melgani, F.; Bruzzone, L. Classification of hyperspectral remote sensing images with support vector machines. IEEE Trans. Geosci. Remote Sens. 2004, 42, 1778–1790. [Google Scholar] [CrossRef] [Green Version]
Zhang, L.J.; Huang, X.D.; Wang, Y.; Wang, C.Y.; Sun, Y.Z. Discussion on dual-tree complex wavelet transform and generalized regression neural network based concentration-resolved fluorescence spectroscopy for oil identification. Anal. Methods 2019, 11, 4566–4574. [Google Scholar] [CrossRef]
Che, S.H.; Tan, X.H.; Xiang, C.W.; Sun, J.J.; Hu, X.Y.; Zhang, X.Q.; Duan, A.G.; Zhang, J.G. Stand basal area modelling for Chinese fir plantations using an artificial neural network model. J. For. Res. 2019, 30, 1641–1649. [Google Scholar] [CrossRef]
Hogland, J.; Billor, N.; Anderson, N. Comparison of standard maximum likelihood classification and polytomous logistic regression used in remote sensing. Eur. J. Remote Sens. 2013, 46, 623–640. [Google Scholar] [CrossRef]
Calle, M.L.; Urrea, V. Letter to the editor: Stability of Random Forest importance measures. Brief. Bioinform. 2011, 12, 86–89. [Google Scholar] [CrossRef]
Olofsson, P.; Foody, G.M.; Herold, M.; Stehman, S.V.; Woodcock, C.E.; Wulder, M.A. Good practices for estimating area and assessing accuracy of land change. Remote Sens. Environ. 2014, 148, 42–57. [Google Scholar] [CrossRef]
Gao, J.P.; Yu, H.N.; Wang, Y.T.; Gao, X.L.; Zhang, X.L. Application of HJ-2A satellites in filed of forest tree species identification. Spacecr. Eng. 2022, 31, 187–194. [Google Scholar]
Wang, H.J.; Tan, B.X.; Wang, X.H.; Fang, X.F.; Li, S.M. Multiple classifiers combination method for precise classification if forest type. Remote Sens. Inf. 2019, 34, 104–112. [Google Scholar]
Hua, L.; Zhang, X.; Chen, X.; Yin, K.; Tang, L. A Feature-Based Approach of Decision Tree Classification to Map Time Series Urban Land Use and Land Cover with Landsat 5 TM and Landsat 8 OLI in a Coastal City, China. ISPRS Int. J. Geo-Inf. 2017, 6, 331. [Google Scholar] [CrossRef]
Wang, M.; Li, M.; Wang, F.; Ji, X. Exploring the Optimal Feature Combination of Tree Species Classification by Fusing Multi-Feature and Multi-Temporal Sentinel-2 Data in Changbai Mountain. Forests 2022, 13, 1058. [Google Scholar] [CrossRef]
Oke, O.A.; Thompson, K.A. Distribution models for mountain plant species: The value of elevation. Ecol. Model. 2015, 301, 72–77. [Google Scholar] [CrossRef]
Li, L.; Jing, W.; Wang, H. Extracting the Forest Type From Remote Sensing Images by Random Forest. IEEE Sens. J. 2021, 21, 17447–17454. [Google Scholar] [CrossRef]
Li, X.; Li, H.; Chen, D.; Liu, Y.; Liu, S.; Liu, C.; Hu, G. Multiple Classifiers Combination Method for Tree Species Identification Based on GF-5 and GF-6. Sci. Silvae Sin. 2020, 56, 93–104. [Google Scholar]
AHoscilo, A.; Lewandowska, A. Mapping Forest Type and Tree Species on a Regional Scale Using Multi-Temporal Sentinel-2 Data. Remote Sens. 2019, 11, 929. [Google Scholar] [CrossRef] [Green Version]
Zhang, Y.; Dian, Y.; Zhou, J.; Peng, S.; Hu, Y. Characterizing Spatial Patterns of Pine Wood Nematode Outbreaks in Subtropical Zone in China. Remote Sens. 2021, 13, 4682. [Google Scholar] [CrossRef]
Ghosh, A.; Fassnacht, F.E.; Joshi, P.K.; Koch, B. A framework for mapping tree species combining hyperspectral and LiDAR data: Role of selected classifiers and sensor across three spatial scales. Int. J. Appl. Earth Obs. Geoinf. 2014, 26, 49–63. [Google Scholar] [CrossRef]
Heinzel, J.; Koch, B. Investigating multiple data sources for tree species classification in temperate forest and use for single tree delineation. Int. J. Appl. Earth Obs. Geoinf. 2012, 18, 101–110. [Google Scholar] [CrossRef]
Dalponte, M.; Bruzzone, L.; Gianelle, D. Fusion of hyperspectral and LIDAR remote sensing data for classification of complex forest areas. IEEE Trans. Geosci. Remote Sens. 2008, 46, 1416–1427. [Google Scholar] [CrossRef]
Mountrakis, G.; Im, J.; Ogole, C. Support vector machines in remote sensing: A review. ISPRS-J. Photogramm. Remote Sens. 2011, 66, 247–259. [Google Scholar] [CrossRef]
Scornet, E. Random Forests and Kernel Methods. IEEE Trans. Inf. Theory 2016, 62, 1485–1500. [Google Scholar] [CrossRef]
Zhang, Y.; Fang, L.; Qiao, Z.; Chen, L.; Zhang, W.; Zheng, X.; Jiang, T. Remote sensing-based identification of forest types and the scale effect in subtropical evergreen forests. Chin. J. Ecol. 2020, 39, 1636–1650. [Google Scholar]
Rautiainen, M.; Lukes, P.; Homolova, L.; Hovi, A.; Pisek, J.; Mottus, M. Spectral Properties of Coniferous Forests: A Review of In Situ and Laboratory Measurements. Remote Sens. 2018, 10, 207. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Nangkang study area and sample points distribution.

Figure 2. Overall accuracy of classification Schemes.

Figure 3. Accuracy comparison of algorithms.

Figure 4. Comparison of identification errors of different tree types.

Figure 5. Relative importance of variables: (a) Scheme 11; (b) Scheme 8.

Table 1. Sample information.

Species of Trees	Mean Tree Height (m)	Tree Height Ranges (m)	Mean Breast Size (cm)	Breast Size Range (cm)
Broadleaved evergreen trees	10.20	6–25	14.45	6–39
Masson pine	9.26	4–25	10.69	6–35
Chinese fir	10.27	5–21	12	7–25

Table 2. Spectral information of GF-2 PMS.

Spatial Resolution (m)	Spectral Band (μm)
1	Panchormatic: 0.45–0.90
4	Blue: 0.45–0.52
	Green: 0.52–0.59
	Red: 0.63–0.69
	Near-infrared: 0.77–0.89

Table 4. Vegetation indices (B, G, R, and NIR represent the reflectivity of blue, green, red, and near-infrared bands, respectively).

Vegetation Indices	Full Name	Explanation	Formulation	Reference
BAI	Burn area index	Commonly used to distinguish the burned area.	1/[(0.1 − R)² + (0.06 − NIR)²]	[34]
DVI	Difference vegetation index	It is used to detect vegetation growth status, vegetation coverage, and eliminate some radiation errors.	NIR − R	[35]
EVI	Enhanced vegetation index	The main function is to reduce the impact of atmospheric and soil noise at the same time, and stably reflect the vegetation in the measured area.	2.5 × (NIR − R)/(NIR + 6 × R − 7.5 × B + 1)	[36]
CI_green	Green chlorophyll index	It is used to estimate the content of chlorophyll in various plants. Chlorophyll content reflects the physiological state of vegetations.	NIR/G − 1	[37]
GEMI	Global environment monitoring index	It is a nonlinear vegetation index proposed for global environmental monitoring, which minimizes atmospheric effects without changing vegetation information.	η (1 − 0.25η) − (R − 0.125)/(1 − R), η = [2(NIR² − R²) + 1.5NIR + 0.5R]/(NIR + R + 0.5)	[38]
SVI	Shaded vegetation index	The product of NDVI and near infrared band.	(NIR − R)/(NIR + R) × NIR	[39]
GNDVI	Green normalized difference vegetation index	The GNDVI index is a modification of NDVI. It also uses near-infrared bands but uses green bands instead of red bands.	(NIR − G)/(NIR + G)	[40]
MSAVI	Modified soil—adjusted vegetation index	It further improves the development index of SAVI, which considers the change of soil factors better than SAVI and does not need to have the prior knowledge of the target study area, that is, it does not need to obtain the soil parameters of the area.	(0.5 + NIR) − 0.5 × [(2 × NIR + 1)² − 8 × (NIR − R)]^1/2	[41]
NDVI	Normalized vegetation index	It is used to detect vegetation growth status, vegetation coverage and eliminate partial radiation error, the most commonly used vegetation index.	(NIR − R)/(NIR + R)	[42]
NDGI	Normalized difference greenness index	It is used to reflect the vegetation greenness.	(G − R)/(G + R)	[43]
OSAVI	Optimization soil-adjusted vegetation index	It is a modified SAVI that also uses reflectance in NIR and red spectra. The difference between the two indices is that OSAVI considers the standard value of the canopy background adjustment factor (0.16).	(NIR − R)/(NIR + R + 0.16)	[44]
RVI	Ratio vegetation index	Sensitive indicator parameters of green vegetation.	(NIR/R)	[35]
SAVI	Soil-adjusted vegetation index	Based on NDVI and a large number of observation data, the soil adjusted vegetation index is proposed to reduce the influence of soil background.	[(NIR − R)/(NIR + R + 0.5)] × 1.5	[45]

Table 5. Factor screening results.

Type of Feature	Factor Screening Results
Spectral feature	B1, B3, B4
Vegetation indices	DVI, MSAVI, NDGI, SVI
Textural feature	contrast1, contrast3, contrast4, correlation1, correlation2, correlation3, correlation4, dissmilarity1, dissmilarity3, homogeneity2, mean2, second moment1, second moment3, second moment4, second moment2, variance1, variance2, variance3

Table 6. Classification Scheme.

Classification Scheme	Type of Feature
Scheme 1	band reflectance
Scheme 2	vegetation index
Scheme 3	textural features
Scheme 4	band reflectance + textural features
Scheme 5	band reflectance + vegetation index
Scheme 6	vegetation index + textural features
Scheme 7	band reflectance + textural features + topographic factors
Scheme 8	band reflectance + vegetation index + topographic factors
Scheme 9	textural features + vegetation index + topographic factors
Scheme 10	vegetation index + textural features + band reflectance
Scheme 11	vegetation index + textural features + band reflectance + topographic factor

Table 7. Model parameters.

Algorithms	Sample	Model Parameters	Scheme
RF	1	ntrees:100; mytry:11	11
	2	ntrees:100; mytry:11
	3	ntrees:100; mytry:14
	4	ntrees:100; mytry:14
	5	ntrees:100; mytry:11
SVM	1	Kernel Type: RBF; gamma:0.1; cost:5	8
	2	Kernel Type: RBF; gamma:0.1; cost:5
	3	Kernel Type: RBF; gamma:0.1; cost:5
	4	Kernel Type: RBF; gamma:0.1; cost:5
	5	Kernel Type: RBF; gamma:0.1; cost:5
KNN	1	n_neighbors:5	8
	2	n_neighbors:8
	3	n_neighbors:8
	4	n_neighbors:2
	5	n_neighbors:2
BP	1	Kernel Type: RBF; gamma:0.1; cost:5	8
	2	Kernel Type: RBF; gamma:0.1; cost:5
	3	Kernel Type: RBF; gamma:0.1; cost:5
	4	Kernel Type: RBF; gamma:0.1; cost:5
	5	Kernel Type: RBF; gamma:0.1; cost:5
PLR	/	/	8

Table 8. Model precision table of optimal classification Scheme.

Algorithm	Accuracy Parameters	Tree Species			Scheme
Algorithm	Accuracy Parameters	Broadleaved Evergreen Trees	Masson Pine	Chinese Fir	Scheme
RF	PA (%)	85.59	96.07	85.53
	UA (%)	84.53	93.22	89.90	11
	OA (%)	88.90
	Kappa	83.30
SVM	PA (%)	88.17	100.00	82.55
	UA (%)	84.66	92.67	94.67	8
	OA (%)	90.27
	Kappa	85.37
KNN	PA (%)	80.43	99.15	82.98
	UA (%)	83.95	90.80	87.86	8
	OA (%)	87.40
	Kappa	81.08
BP	PA (%)	84.26	95.46	88.09
	UA (%)	83.70	92.42	90.86	8
	OA (%)	88.77
	Kappa	83.12
PLR	PA (%)	82.33	96.60	85.11
	UA (%)	83.27	93.21	87.82	8
	OA (%)	87.95
	Kappa	81.88

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yuan, J.; Wu, Z.; Li, S.; Kang, P.; Zhu, S. Multi-Feature-Based Identification of Subtropical Evergreen Tree Species Using Gaofen-2 Imagery and Algorithm Comparison. Forests 2023, 14, 292. https://doi.org/10.3390/f14020292

AMA Style

Yuan J, Wu Z, Li S, Kang P, Zhu S. Multi-Feature-Based Identification of Subtropical Evergreen Tree Species Using Gaofen-2 Imagery and Algorithm Comparison. Forests. 2023; 14(2):292. https://doi.org/10.3390/f14020292

Chicago/Turabian Style

Yuan, Jiayu, Zhiwei Wu, Shun Li, Ping Kang, and Shihao Zhu. 2023. "Multi-Feature-Based Identification of Subtropical Evergreen Tree Species Using Gaofen-2 Imagery and Algorithm Comparison" Forests 14, no. 2: 292. https://doi.org/10.3390/f14020292

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Feature-Based Identification of Subtropical Evergreen Tree Species Using Gaofen-2 Imagery and Algorithm Comparison

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Methods

2.2.1. Field Campaign

2.2.2. Remote Sensing Data and Preprocessing

2.2.3. Feature Extraction and Screening

2.2.4. Classification Scheme

2.2.5. Classification

2.2.6. Accuracy Assessment

2.2.7. Assessment of the Relative Importance of Variables

3. Results

3.1. OA Evaluation of Each Classification Scheme

3.2. Assessment of the Optimal Accuracy of Each Classification Algorithm

3.3. Assessment of Tree Species Classification Accuracy

3.4. Relative Importance of Variables

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI