An Optimization Method of Deep Transfer Learning for Vegetation Segmentation under Rainy and Dry Season Differences in a Dry Thermal Valley

Chen, Yayong; Zhou, Beibei; Ye, Dapeng; Cui, Lei; Feng, Lei; Han, Xiaojie

doi:10.3390/plants12193383

Open AccessArticle

An Optimization Method of Deep Transfer Learning for Vegetation Segmentation under Rainy and Dry Season Differences in a Dry Thermal Valley

by

Yayong Chen

^1,2,†,

Beibei Zhou

^3,*,†,

Dapeng Ye

^1,2,*,

Lei Cui

⁴,

Lei Feng

⁵ and

Xiaojie Han

⁶

¹

College of Mechanical and Electrical Engineering, Fujian Agriculture and Forestry University, Fuzhou 350012, China

²

Fujian Key Laboratory of Agricultural Information Sensoring Technology, Fuzhou 350012, China

³

State Key Laboratory of Eco-Hydraulics in Northwest Arid Region of China, Xi’an University of Technology, Xi’an 710048, China

⁴

China Renewable Energy Engineering Institute, Beijing 100032, China

⁵

Central South Survey and Design Institute Group Co., Ltd., Changsha 410014, China

⁶

China Electric Construction Group Beijing Survey and Design Institute Co., Beijing 100024, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Plants 2023, 12(19), 3383; https://doi.org/10.3390/plants12193383

Submission received: 3 September 2023 / Revised: 19 September 2023 / Accepted: 21 September 2023 / Published: 25 September 2023

(This article belongs to the Special Issue Deep Learning in Plant Sciences)

Download

Browse Figures

Versions Notes

Abstract

:

Deep learning networks might require re-training for different datasets, consuming significant manual labeling and training time. Transfer learning uses little new data and training time to enable pre-trained network segmentation in relevant scenarios (e.g., different vegetation images in rainy and dry seasons); however, existing transfer learning methods lack systematicity and controllability. So, an MTPI method (Maximum Transfer Potential Index method) was proposed to find the optimal conditions in data and feature quantity for transfer learning (MTPI conditions) in this study. The four pre-trained deep networks (Seg-Net (Semantic Segmentation Networks), FCN (Fully Convolutional Networks), Mobile net v2, and Res-Net 50 (Residual Network)) using the rainy season dataset showed that Res-Net 50 had the best accuracy with 93.58% and an WIoU (weight Intersection over Union) of 88.14%, most worthy to transfer training in vegetation segmentation. By obtaining each layer’s TPI performance (Transfer Potential Index) of the pre-trained Res-Net 50, the MTPI method results show that the 1000-TDS and 37-TP were estimated as the best training speed with the smallest dataset and a small error risk. The MTPI transfer learning results show 91.56% accuracy and 84.86% WIoU with 90% new dataset reduction and 90% iteration reduction, which is informative for deep networks in segmentation tasks between complex vegetation scenes.

Keywords:

vegetation in field; deep learning optimization; dry-hot valley environment; migration training; best migration learning conditions

1. Introduction

Vegetation segmentation is an important basic operation of identifying and extracting vegetation regions in satellite or aerial images using computer vision techniques [1]. Obtaining plant information in the actual environment is of great significance in the fields of agricultural information monitoring [2], environmental data investigation [3], soil and water conservation [4] and ecological information statistics [5]. Currently, an efficient, accurate, and flexible vegetation segmentation processing method is urgently needed to meet the ever-increasing image data, the increasingly meticulous vegetation assessment, and the monitoring requirements [6]. By extracting image features and automatically improving their accuracy, these deep learning methods can greatly simplify the overall workflows and improve the accuracies of the wild vegetation information surveys [7,8]. However, deep learning for vegetation segmentation still faces the problems of dataset labeling and training time cost [9,10]. Thus, many experts and scholars are working on improvements such as enhanced network structures [11,12], more efficient data sources [13], and further optimization existing network features [14].

Network structure enhancements have wide applications in agricultural tasks, e.g., H. Sheng [11], S.Y. Zhao [12], and N. Lin [15] improved deep network feature extraction and enhanced network accuracy by adding attention mechanisms. Under limited computational capacity, the attention structure in deep networks (mimicking the attention of a biological brain) allocates more resources to the most important features while reducing or even filtering out less relevant computational to improv efficiency and accuracy [16]. Similar to attentional structures, LSTM (Long Short-Term Memory) structures, mimicking biological memories [17,18,19], can enhance information delivery and improve the network learning ability [20] because the cumulative updating strategy and forgetting/memory gating methods enhance gradient transfer [21,22]. Additionally, the Pyramid Pooling Module structure [23,24] and self-encoder structure [25,26] can also reduce the vanishing gradient problem and improve the training effectiveness of deep learning [27]. Although structural improvement methods can improve the network capacity, such methods may increase the feature quantity [28], which requires more data and training costs.

In addition to the improvements in network structure, the quality and quantity of data can significantly impact deep learning. V. Balagurusamy [13], B. Sorscher [29], K. Alzhrani [30], and R. P. Kosaraju [31] reported that pruning the dataset could improve the model effectiveness and increase the training speed. S. Yang [32] also pointed out that the dataset pruning is mainly achieved by removing redundant or irrelevant data, which reduces the required training time but may change the data distribution and lead to valid data loss. Additionally, increasing the data size by GAN (Generative Adversarial Networks) is also a common approach, such as training GAN models on existing images data and enlarging the dataset to improve the network accuracy, as proposed by V. Giuffrida [33], F. Zhu [34], Q. Zeng [35], and B. Liu [36]. Of course, data augmentation with traditional image processing methods is also a very meaningful approach to enhance the networks robustness and accuracy, such as methods proposed by G. Geetharamani [37], S. Zhuang [38], and A. Fuentes [39]. Although these dataset extension methods can simplify the manual labeling process, they also increase the computational cost.

The transfer learning introduced by R. Yu [40], M. Zhen [41], and A. O. Agbaje [42] is also a deep network optimization process that has significant advantages in both training time and dataset size. Transfer learning focuses on how to make a pre-trained network applicable to a new relevant task, using only minimal new data and training cost [43]. For example, Y.J. Wang [44], M. Hussain [45], N. Best [46], and J. Chen [14] froze some feature layers of networks and achieved good accuracy with little new data (e.g., 80 [47]). However, most existing transfer learning methods were implemented based on TaE (trial-and-error method), such as N. Best [46] and Y.J. Wang [44], which might still need further improvement in systemic and controllability overall.

Based on the above discussion and analysis, transfer learning is suitable for vegetation segmentation tasks between complex environments in the wild. So, in this study, a vegetation segmentation deep network pre-trained in the rainy season was transferred to the dry season environment using little new data and training time in DTV (Dry Thermal Valley, where there are significant seasonal variations differences). In the pre-training phase, four deep networks, including Seg-Net (Semantic Segmentation Networks), FCN (Fully Convolutional Networks), Mobile-Net v2, and Res-Net 50 (Semantic Segmentation Networks), were constructed and pre-trained with rainy season vegetation data in order to select the best network (Res-Net 50) for subsequent transfer optimization. Thereafter, an MTPI method (Maximum Transfer Potential Index) was proposed to find the optimal transfer conditions, which evaluated and optimized both data and feature volume to accurately migrate the pre-trained network with dry season vegetation data. The main purposes of our study are to reduce the TaE cost, improve the risk controllability of the transfer learning, and supply a reference for vegetation information extraction in the wild, especially in the reduction in datasets and training time for vegetation segmentation multitasking in complex environments.

2. Results and Discussions

2.1. The Four Pre-Trained Deep Vegetation Segmentation Networks

Figure 1 gives the vegetation segmentation images of four vegetation covers using the four deep learning networks in the wilderness experiment. The segmented regions of trees, shrubs, grasses, and NVA were labeled with blue, red, green, and magenta markers with 20% transparency for each displayed. Column a of each row represents the original RGB image, whereas column b represents the manually labeled result, and columns c, d, e, and f represent the segmentation results obtained using the Seg-Net, FCN, Mobile-Net v2, and Res-Net 50 networks, where the references are the images in column b. In general, these four deep learning networks achieved vegetation segmentation; for example, all four deep networks achieved a good result on the NVA compared with column b while also showing that the NVA segmentation task was a simpler part (features such as color texture were visible and evident to the human eyes). Moreover, the Mobile-Net v2 considered the shrubs more as trees, e.g., Figure 1(1e) (1.e represents the resulting image in row 1 and column e). In Figure 1(2e), the FCN misclassified shrubs as grasses more, which might have been related to the convolution depth of the network, where deeper networks have better resultant performance. In exception, both Mobile-Net v2 and Res-net 50 performed well in the segmentation results of grass and NVA, e.g., Figure 1(1e,1f). But, in shrub regions, Res-net 50 was still significantly better than Mobile-Net v2, e.g., the difference between Figure 1(3e,3f), which may have been related to the performance structure of the Res-net 50. Among these results, the segmentation effect of Res-net 50 was the best among all vegetation species, which may have been related to the residual structure of the network introduced in Res-net 50. This was consistent with the results of A. A. Alf [48] and H. Yu [49].

Figure 2 shows the Confusion Matrix results (an overall statistical result of a dataset) for the four pre-trained deep learning networks on the test dataset (

{D A}_{1}

introduced in Section 4.1). The vertical axis is the pixel point output results for each category, whereas the horizontal axis represents the actual category (percentage) within that output category, i.e., the column a on the vertical axis corresponds to the NVA category subdivided into four categories: tree, shrub, grass, and NVA. The diagonal elements of the confusion matrix from the top left to the bottom right reflect the accuracy of the model for each category, with higher values showing better model performances. The results showed that all four deep learning networks achieved some performances on the dataset, with Res-net 50 achieving the highest overall accuracy (94.56, 89.06, 91.83, and 96.90 (%) in Figure 2d, whereas Seg-Net showed the lowest overall accuracy (75.17, 49.81, 67.39, and 87.00 (%) in Figure 2a. Figure 2 showed that Seg-Net had a weak segmentation performed for shrub and grass (Figure 2a, only about 50% and 67%), and FCN was slightly better than that of Seg-Net for grass and shrub (about 10% improvement on grass and shrubs, Figure 2b, whereas the results of Mobile-Net v2 had some degree of accuracy better than Seg-Net and FCN for all four vegetation categories (about 10% improvement than FCN for all categories in Figure 2c; on the other hand, Res-net 50 was better than the three previous network structures in all categories (Figure 2d). These results might have been related to the residual connections structure in Res-net 50, as residual structures could be efficiently learned at deeper levels without being prone to overfitting, which was consistent with previous findings.

Table 1 gives the four pre-trained deep learning networks’ results on the

{D A}_{1}

from five perspectives (WPA, WRE, WF1, WKs, and WIoU). Among them, WPA refers to the accuracy of pixel segmentation; WRE is the ability of the model to find positive pixels; WF1 evaluates the comprehensive performance of the model; WKs is the Kappa coefficient, being the overall classification accuracy; and WIoU is the overlap of the real regions of the predicted regions. Based on the above five evaluation perspectives, Res-Net 50 was found better overall performance than Seg-Net or FCN, (e.g., the WIoU matrixes in Figure 2a–d. This may have been attributed to the fused residual structure of the Res-Net 50 network with deeper network layers and better vegetation segmentation capabilities. Combining the above example analysis (Figure 1) and the statistical results analysis (Figure 2 and Table 1), the pre-trained Res-net 50 outperformed the other three networks in terms of overall segmentation effect and per-class segmentation effect. This also indicated that Res-net 50 performed well on simple tasks (NVA class segmentation) and complex tasks (grass, shrubs, and trees), both in detail (local results in Figure 2) and overall (Table 1). Therefore, the later work in this paper was discussed using the Res-net 50 network as an example.

2.2. Pre-Trained Networks Performance with Rainy and Dry Season Data

Table 2 shows the segmentation results of the pre-trained Res-net 50 on

{D A}_{2,3, 4}

(built in Section 2.2). Comparing the values of

{D A}_{2,3, 4}

(e.g., WPA), it could be found that the segmentation effect of

{D A}_{2}

(0.9352) was better than

{D A}_{3, 4}

(0.4382 and 0.1705). The results illustrated that the Res-net 50 pre-trained with the rainy season shady slope data from a DTV still had a strong segmentation effect for rainy season sunny slope data, but no longer fitted for dry season shady or sunny slope data. This might have been because vegetation image characteristics (e.g., color and texture) were influenced by rainy and dry season variability (primarily a moisture factor) in DTV environments, which complemented the image sample image differences in the Section 3.2. In addition, the results of

{D A}_{3}

were slightly better than

{D A}_{4}

, where WPA, WRE, and WF1 were all higher than

{D A}_{4}

, which might have shown that sunny slopes were more water-scarce than shady slopes in the dry season.

The three confusion matrices given in Figure 3 were obtained using the Res-net 50 network pairs pre-trained in Section 2.1 and

{D A}_{2,3, 4}

. Comparing the overall results in Figure 3a–c, the pre-trained Res-net 50 still performed well in

{D A}_{2}

but weakly in

{D A}_{3,4}

(e.g., the top left to the bottom right values), indicating that a network pre-trained with the rainy shady data still segmented effectively on the rainy sunny data but weakly on dry season data. The confusion matrixes in Figure 3a–c showed that the Res-net 50 network values pre-trained using

{D A}_{2}

were significantly better than those in

{D A}_{3,4}

e.g., the top left to the bottom right values of Figure 3a,b, which suggested that the rainy day data were still effective for the four vegetation types on the rainy day side but were weaker for the dry sunny day and rainy day data (for the three vegetation types of trees, shrubs, and grasslands). This also showed that the color, texture, and other characteristic data of the three vegetation types of grass, irrigation, and trees were evidently influenced by the rainy and dry seasons. In addition, the segmentation effects of the NVA in Figure 3a–c (the three results data in red circles) still maintained great accuracy. This might have been because the image features in the NVA were less affected by the rainy and dry seasons factors, and the image features between NVA and vegetated areas were also evident, making the pre-trained deep network achievable acceptable in all datasets.

Figure 4 showed the feature output (TPI mapping) results of randomly selected samples from

{D A}_{1,2, 3,4}

at different layer depths of the pre-trained network. The color images showed the original RGB samples, and the index mapping showed the feature outputs for different TPs, including four kinds of tree, shrub, grass, and NVA from left to right. Among them, blue, green, red, and magenta regions were, respectively, indicated with strong feature outputs and were consistent with the actual tree, shrub, grass, and NVA targets. The black regions showed that there were neither significant feature regions nor corresponding vegetation regions at the TPs in the pre-trained Res-net 50. White regions meant distinctive features, but without overlapping with corresponding vegetation areas. Therefore, the larger areas of red, green, blue, and magenta regions in the figure indicated that the features of the TP were more evident and more favorable to the segmentation results. The larger areas of the white regions showed the lower overlap between the feature output and the actual vegetation, which was not favorable to segmentation. Therefore, the pre-trained network had a great feature extraction effect and output effect with

{D A}_{1,2}

, such as the regions with evident features and high overlaps for each feature layer in Figure 4a. However, different TPs in the pre-trained network were still extracted from feature regions of a certain range in the

D_{3,4}

, but, still, few regions did not overlap well with the actual regions and were not perfected well in the higher TP, as shown in the grass output in Figure 4d. Therefore, the low TP feature part was left unmodified in the pre-trained network for

D_{3,4}

segmentation tasks, and only transfer training was performed in the high TP part for optimization. Such a processing method could probably make the updated features less and the network layers shallower in the training task, which effectively reduced the amount of training data required and enhanced optimization training efficiency.

Figure 5 showed the results of

T P I c

(TPI curve, conform to Equation (8)) for each LP of the pretraining network in

D_{1,2, 3,4}

. Where the x-axis was the network TP, and the y-axis was the TPI, which meant the transfer potential of the network layer. The general trend in the

T P I c

showed that the

T P I c

showed a rapid upward trend and then leveled off or fluctuated with an up-and-down trend as the TP in the pre-trained network increased. This changing scenario might have been determined by the characteristics of the TPI since the feature size factor (the number of features that needed to be trained) for the network transfer was more unfavorable when the TP was small, and, even if the network had a higher potential for accuracy at this point, its overall evaluation index was still small. As the

T P

elevated, the migrated feature size decreased rapidly, which made the curve rise. When the

T P

became larger, the feature size was no longer the main influence, and the accuracy influence of transfer began to appear, making the

T P I c

flatten or fluctuate up and down; this regularity was also responding to the basic characteristics of the TPI. The maximum values of

{T P I c}_{1,2}

were close to 100%, which meant that the starting point of transfer learning was close to perfect, which was consistent with the earlier conclusion. In addition, the curves showed the results of

{T P I c}_{1}

≈

{T P I c}_{2}

>

{T P I c}_{3}

>

{T P I c}_{4}

, which indicated that the pre-trained network had good segmentation effects in the

D_{1,2}

, but poor results in the combination of

D_{3,4}

(

D_{3}

was better than

D_{4}

), which corresponded to the starting point of transfer learning, and this feature was also consistent with the test results of the previous pre-trained network. The

{T P I c}_{0}

(average of

{T P I c}_{1,2, 3,4}

) described the potential of the pre-trained networks for

D_{1,2, 3,4}

in the transfer task, whereas the available (near-optimal solution) network depths for

D_{1,2, 3,4}

were obtained by obtaining the maximum value of

{T P I c}_{0}

(the value determined in this paper was TP₀ = 37).

2.3. Transfer Learning Results in TPI Condition

Figure 6 shows the loss and accuracy trends for the pre-trained network transfer to

{D A}_{3}

, where the TP was around

T P = {T P}_{0}

(

{T P}_{0}

, the result of Section 2.2), with a TDS of 1000 and a maximum iteration number of 1000. As shown in Figure 6a, the network with transfer learning significantly outperformed the TP = 0 in learning efficiency (iterations needed for convergence). This might have been because that the transfer learning was an optimization process based on pre-trained networks since most the network parameters had clear effects in similar tasks (e.g., Figure 6. in Section 2.2) and did not have to be perfected again. Comparing the loss curves for different TPs in Figure 6a, it was shown that the larger the TP was the faster the training speed would be. This might have been because the larger the TP was the fewer layers and the smaller the number of features would be perfected, which made stabilization easier. However, Figure 6b showed that the higher the TP the lower final accuracy was, which was probably due to the less optimized features not being sufficient for the segmentation task. Such results also showed that the TPI parameters proposed in this paper were informative in balancing the learning time cost and the final accuracy in transfer training.

Figure 7 shows the effect of transfer training for pre-trained networks with different dataset sizes and the same TP (=TP₀). As the figure shows, the larger the dataset used the larger iterations needed, which was a proven result in deep learning, and such regularity was similar to the process of transfer learning. In addition, when TDS received larger values, the results were better, and, when TDS was close to

{T D S}_{0}

(MTPI condition, 1000 in this paper), the transfer results were close to the pre-trained network effect. However, the actual learning process of deep networks and the separations of datasets in different tasks were highly stochastic, and a certain amount of data margin needed to be left to ensure the actual effect of transfer learning.

Figure 8 shows two random samples of the transfer learning results of the pre-trained network under MTPI conditions, where the first row was a random sample of

D_{3}

after transfer, and row 2 was a random sample of

D_{4}

. Among them, column a was the original RGB image, column b was the manually labeled image, and column c was the result segmented by the transferred network. The segmented regions of trees, shrubs, grass, and NVA were labeled with blue, red, green, and magenta markers with 20% transparency, respectively. From the segmentation results shown in Figure 8, the networks after transfer learning could segment vegetation in the rainy and dry seasons of DTV to an extent. It also showed that transfer learning could be effectively implemented in MTPI conditions, which could be quickly and efficiently perfected for the adaptability of deep networks in similar tasks. Although the segmentation results might not have performed perfectly, such as the shrub segmentation results in a.1, this was a transfer result achieved with about 10% iterations and 10% data size, which were acceptable results under the combined cost and accuracy condition. It also showed that transfer learning under TPI conditions could significantly reduce the manually labeled data and the training cost required by the network to improve the capability for a new task.

Figure 9 shows the confusion matrixes results of the two deep learning networks after transfer learning for the

{D A}_{3,4}

, respectively. The results showed that the deep network of transfer learning achieved acceptable accuracy on both datasets, whereas the results in

{D A}_{3}

were better than those in

{D A}_{4}

, for example, in the accuracy of shrubs. The probable reason was that the pre-trained network performed inherently better on

{D A}_{3}

than

{D A}_{4}

, and the transfer training results at the same TPI conditions also had persistence. In Figure 9b, more shrubs were misclassified as grass (about 12%), which could have possibly been because the differences in image characteristics between trees and shrubs were less than other groups (four groups in total: trees, shrubs, grasses, and NVA), which could also have been related to the distribution of water between vegetation on the sunny slopes under DTV conditions. The five results (WPA, WRE, WF1, WKs, and WIoU) of

{D A}_{3}

were 93.19%, 93.42%, 93.18%, 86.90%, and 87.69%, whereas they were 89.92%, 88.92%, 89.90%, 85.70%, and 82.02% in

{D A}_{4}

. In addition, the TPI condition was a comprehensive result (approximate best solution) evaluated with efficiency and accuracy in mind, and the actual application may require further optimization based on the TPI condition, according to the complexity of the task, the complexity of the data, and the actual training results.

To verify the advantages of the MTPI condition proposed in this paper, three methods of deep learning transfer training of three types (including 10 studies) were referenced for comparative discussion. Their transfer learning process could be transformed to TPI conditions for discussion, for example, M. Zhen [41], A.O. Agbaje [42], Y. J. Wang [44], A. Abbas [50], and J. Chen [14] froze all layers except for the output layer classification for transfer; thus, this type of method (method one) was transformed to TPI conditions equivalent to the TP = −1 condition. In contrast to the above, method two, applied by A. Abdalla [51], K. He [52], and M. Hussain [45], largely unfroze the feature cases of the network layers so that transfer learning could be performed, despite the fine-tuning of the feature parameters of each layer, converting to the case where the TPI condition was equivalent to TP = 0. In addition, there were transfer learning methods discussed in the structure of each network block; for example, N. Best [46] and W. Wu [47] used method three, which was equivalent to comparing the results obtained using TaE for some columns of the TP equivariant series when converting to TPI space. Although method one may have good potential in respect to the data set size required, we hold the view that this method was suitable for optimization between simple tasks or very similar tasks, as in the curve of TP = 0 in Figure 6. For method two, the expected target could achieve better in accuracy of the task, but this method must perfect most feature parameters in the whole network, and, according to our discussion in the TPI condition, the size of the optimization and the required data set were more related to the complexity of the task, and the time and data cost consumed were higher when the task deviation was large. According to the discussion in our TPI condition, the size of the possible optimized feature size and the required dataset were more related to the complexity of the task, and the time and data cost consumed were higher when the task deviation was larger. From the realized results, method three appeared to have the effect of the MTPI method, or even exceeded the TDS of the MPTI condition during the TaE process. However, method three was more randomized and unplanned, which required more manual experience or more efficient experimental designs (e.g., waypoints of TaE). The TaE process might consume significant time and manual labeling costs. Due to the complexity of the data and network differences, it was difficult to compare the actual values of methods one, two, and three with the results of this paper directly, but comparing the results of different TPs in Figure 7 showed that the MTPI conditions proposed in this paper had evident advantages in terms of comprehensiveness and systematic planning.

3. Materials

3.1. Data Collection Environment and UAV Equipment

The study site was conducted in a typical DTV climate area in Qiao-jia County, Zhaotong City, Yunnan Province, China (102°53′10.83′′ E (east), 27°10′8.96′′ N (North)), shown in Figure 10a. The annual evapotranspiration (about 2400 mm) in this region is about 3 times higher than the annual rainfall (800 mm) [53], and the rainfall during the rainy season (from about June to October each year) accounts for approximately 90% [54], with significant vegetation differences between rainy and dry seasons. The image data were collected on two randomly selected rectangular areas (approximately 500 × 750 m²) on the shady and sunny slopes of the same hillside. The area receives sufficient moisture, and the evapotranspiration was similar in the rainy season, so the vegetation of the two slopes is very close to each other. However, during the dry season, the evapotranspiration is higher on the sunny slopes than on the shady slopes, which makes the vegetation image features of the two slopes different.

To acquire images of vegetation distribution on shady and sunny slopes in the DTV in the rainy and dry seasons, two consumer-grade UAVs (unmanned aerial vehicles) were used to collect high-resolution low-altitude image data, including DJI Phantom 4 v2 and DJI Matrice 300 (DJI, DJI Innovation Technology Co., Ltd., Shenzhen, China) as shown in Figure 10b,c. The main parameters of the two UAVCSs (UAV-camera systems) are listed in Table 3, including positioning accuracy, UAV parameters, and camera parameters (RGB camera, RGB means Red/Green/Blue). Furthermore, the controllers of the two UAVCSs were Android smartphones (HUAWEI P20 PRO, Huawei, Shenzhen, China) + DJI pilot apps (DJI, Shenzhen, China)) with software control + manual remote control (Emergency). In addition, the Phantom 4 v2 UAV was controlled by the Android smartphone + DJI Pilot App + manual remote control (when in an emergency), whereas the DJI Matrice 300 UAV uses a DJI remote control with screen industry edition remote control with an integrated 5.5-inch 1080p high-brightness display.

3.2. Data Collection and Dataset

The four data components (

{d a t a}_{n}

,

n = 1,2, 3,4

) were collected in two batches during dry and rainy seasons:

n = 1

meant shady in rainy (means shady slope image in and the rainy season),

n = 2

for sunny in rainy,

n = 3

for shady in dry, and

n = 4

for sunny in dry. The

{d a t a}_{1,2}

were collected in mid-August 2021 (the middle of the rainy season, when the rainfall is almost the heaviest), and the

{d a t a}_{3,4}

were collected in early June 2021 (the end of the dry season, when vegetation is affected by the dry season for a longer period). During data acquisition, the UAVCSs were flown at an altitude of 200 m (vertical height relative to the center of the area), so the image resolution was about 5 px/cm. The heading overlap rate was set to 70% and 75% for lateral (with image overlap rate of 52.5% (70% × 75%)), where the heading overlap rate meant the image overlap rate parallel to the UAV direction, and the lateral overlap rate meant the image overlap rate perpendicular to the heading overlap rate. The images number of each

{d a t a}_{n} (n = 1,2, 3,4)

was 500, which could be expressed as

{d a t a}_{n} = {\{{I M}_{I}\}}_{n}; (n = 1,2, 3,4; I = 1,2, 3, \dots, 500)

, so the total image data collected in study site was 2000 (500 × 4). The image data

{I M}_{I}

can be denoted as

{I M}_{I} = H \times W \times 3

,

{{I M}_{I} \in d a t a}_{n}

; among them, the H was the image high (pixels) of the RGB image (3 channels), and the W was the image width (pixels), whereas

H = 3000, W = 4000

(for the Phantom 4 v2 UAVCS) or

H = 4800, W = 6400

(for the Matrice 300 UAVCS). To make hardware memory allocation and high operational efficiency proper in deep learning training and testing, the data

{\{{I M}_{I}\}}_{n}

were cropped into smaller localized image samples, denoted as

D_{n} = {\{{i m}_{i}\}}_{n}; n = 1, \dots, 4; i = 1,2, \dots, M

; while

{i m}_{i} = h \times w \times 3

. The sample size of

{i m}_{i}

was unified as

h = w = 400

in

D_{n}

, referring to the input size (

h = w = 224

, the build dataset and networks layers should be larger than/equal to 224) of the deep learning network established by K. He [52]. Thus, each dataset

D_{n}, n = 1, \dots, 4

contained 2500 image samples (M = 2500), and each sample

{i m}_{i}

was cropped in random positions of

{I M}_{I}

(

I

was random integer from 1 to

N

), such as several examples shown in Figure 11. Additionally, each image sample

{i m}_{i}

in the

D_{n}

was manually labeled with corresponding segmentation label

L_{n} = {\{{l a b e l}_{i}\}}_{n}; n = 1, \dots, 4; i = 1,2, \dots, M

. Additionally, each image sample

{i m}_{i}

in the

D_{n}

was manually labeled its corresponding segmentation label

L_{n} = \{{l a b e l}_{i}\}; n = 1, \dots, 4; i = 1,2, \dots, M

. The manually labelled results included four vegetation states, for example, NVA (Non vegetated area): (magenta (255,0,255), (RGB = [0,255,255] (red/green/blue channel values))), grass (red, (255,0,0)), shrubs (green, (0,255,0)), and trees (blue, (0,0,255)).

To improve the model’s accuracy and generalization capability, each established dataset

D_{n}

was augmented with 7 methods: symmetry, panning, HSV (Hue/Saturation/Value) random enhancement/reduction, scale up/down, rotation, noise addition, and cropping, as shown in Figure 12. Among them, symmetry included (random) x-direction (longitudinal) and y-direction (transverse), with the symmetry axis passing through the sample center. The random panning method included x and y directional panning from −5 to 5% (5% × h/w = 20). The random HSV enhancement/reduction was obtained by converting the RGB to HSV color space and randomly enhancing/reducing the three channels separately, with the range of 0.9~1.1. The random rotation range was ±5°, and the rotation center is the image center. The random noises used were Gaussian image noises with a mean of 0 and a variance of 0.1. The cropping process was performed only with the image part beyond the original range, so the size after data augmentation was unchanged (

h = w = 400

). After the above seven combination methods, each dataset

D_{n}

was augmented to 15,000 (

{D A}_{n}, {D A}_{n} = {{i a}_{i}}

), as

n = 1, \dots, 4; i = 1,2, 3, \dots, M A

(

M A = 15,000

), and the total size of four augmented datasets was 60,000 (15,000 × 4).

3.3. The Four Deep Segmentation Networks and Evaluation Metrics

In this study, the environment for building, training, optimizing, and evaluating the deep learning networks was MATLAB 2023a running on a Win 11 PC with a 12G NVIDIA GeForce RTX 3060 graphics card (12,000 MHz), a 12th generation Intel(R) Core (TM) i5-12600KF CPU (3.6 GHz), and 64G DDR4 RAM (3600 MHz). The four popular deep learning network structures used in this work, including Seg-Net [52,55], FCN [56], Mobile-Net v2 [57], and Res-Net 50 [58], have been given in Figure 13. Among them, the Seg-Net is a convolutional neural network framework using an encoder–decoder structure with high resolution and detail retention when processing large-scale images or increasing the depth of the network. The FCN is a semantic segmentation model with a full convolution structure, which is fast and occupies less memory, but it may reduce resolution. The Mobile-Net v2 maintains accuracy, with a low complexity network, fast training, and good operability, but may have low accuracy for small objects. The Res-Net 50 is a segmentation network using residual blocks, which has high accuracy and few parameters, but gradient explosion may occur in the training process. The segmentation performances of the four networks were built and trained, with the most suitable deep learning network selected for segmentation and further optimized to obtain efficient results.

Five parameters were used to evaluate the performances of the four deep learning networks, including WPA (weighted pixel accuracy), WRE (weighted pixel recall), WF1 (weighted F1 score), WIoU (weighted intersection over union), and WKs (W-kappa score, weighted Cohen kappa coefficient score). Among them, WPA characterized the correctness effect of pixel segmentation, as shown in Equation (1).

W P A = \sum_{i = 1}^{C} ({p_{i} \times p r e c i s i o n}_{i}) = 1 / C \sum_{i = 1}^{C} {T P P}_{i} / ({T P P}_{i} + {F P P}_{i})

(1)

where,

i

is the type of vegetation,

i = 1, \dots, c; c = 4

,

p_{i}

is the probability that the i-th-planted pixel accounts for in the overall dataset images in Equation (2),

{T P P}_{i}

is the number of true positive pixels,

{T N P}_{i}

is the number of true negative pixels,

{F P P}_{i}

is the number of false positive pixels, and

{F N P}_{i}

is the number of false negative pixels.

p_{i} = \sum_{i = 1}^{M} {|\{C_{c}\}|}_{(1, i)} / \sum_{c = 1}^{C} \sum_{i = 1}^{M} {|\{C_{c}\}|}_{(1, i)}

(2)

when

I_{i}

is the pixel of the label of a dataset (for example DA),

|\{C_{c}\}|

means the total number of pixel eligible for x.

I_{i}

means the pixels were labeled as planted

i

,

i = 1, \dots, C

. The

W R E

characterizes the ability of the model to find all positive pixels, as Equation (3).

W R E = 1 / C \sum_{i = 1}^{C} (p_{i} \times {r e c a l l}_{i}) = 1 / C \sum_{i = 1}^{C} {T P P}_{i} / ({T P P}_{i} + {T N P}_{i})

(3)

The

W F 1

is the harmonic mean of precision and recall (

0 \leq W F 1 \leq 1

), with higher values showing better model performance. When the accuracy and recall of the model are both high, the

W F 1

achieves a maximum value of 1. Therefore, the

W F 1

is a metric that combines the accuracy and recall of the model and can be used to evaluate the overall performance of the model, expressed as Equation (4).

W F 1 = \sum_{i = 1}^{C} (p_{i} \times (2 \times {p r e c i s i o n}_{i} \times {r e c a l l}_{i}) / ({p r e c i s i o n}_{i} + {r e c a l l}_{i}))

(4)

where the

W I o U

is the weight IoU metric, which measures the overlap between the predicted and ground truth masks, as in Equation (5).

W I o U = \sum_{i = 1}^{C} p_{i} \times {T P P}_{i} / ({T P P}_{i} + {F P P}_{i} + {F N P}_{i})

(5)

Finally,

W K s

is the kappa coefficient that represents the overall classification accuracy, which is used to parse the accuracy of the model for different frames, expressed as Equation (6).

W K s = (P_{o} - P_{e}) / (1 - P_{o})

(6)

where

p_{e}

is the expected probability of agreement by chance, the

p_{o}

refers to the output sum of real classes and the number of results of the sample

W K s

calculates as −1 to 1, but usually kappa falls between 0 and 1, which can be divided into five groups to represent different levels of consistency: 0.0 to 0.20 extremely low consistency (slight), 0.21 to 0.40 general consistency (fair), 0.41 to 0.60 moderate consistency (moderate), 0.61 to 0.80 height of consistency (substantial) and 0.81 to 1 almost perfect.

4. Methods

The three main phases introduced in this study included deep networks pre-training, MTPI condition solving, and pre-training network transfer learning, as shown in Figure 14.

4.1. Deep Learning Networks Pre-Trained

Before pre-training deep networks, the established

{D A}_{1}

was split randomly into three subsets of

{T r a n i n g S e t}_{1}

,

{V a l i d a t i o n S e t}_{1}

, and

{T e s t S e t}_{1}

(4:1:1). Among them, the

{T r a n i n g S e t}_{1}

was used to pre-train the network parameters such as weights and biases. The

{V a l i d a t i o n S e t}_{1}

performed periodic tests (e.g., 100) on the intermediate and final networks obtained during the training process. The

{T e s t S e t}_{1}

was used to evaluate the performance and segmentation capabilities of the four networks. Then, the four deep networks were built for pre-training, including Seg-Net, FCN, Mobile-Net v2, and Res-net 50 (introduced in Section 3.3). As the field UAV image dataset possibly created imbalanced sample categories, the training process may have been biased towards large categories and reduce the segmentation performance. To reduce the problem of imbalanced categories’ training and to improve the network’s accuracy, all of the four deep networks were processed with the categories weighting method. Suppose

L_{1}^{'}

was the images label set in the

{T r a n i n g S e t}_{1}

, and

{l a b e l}_{(i, 1)} \in L_{1}^{'}

was the

i - t h

label image. The set

{\{C\}}_{(i, 1)}, C = 1,2, 3,4

was used to represent all the pixel points of each type of vegetation in

{l a b e l}_{(i, 1)}

, when

c = 1

meant the trees pixels;

c = 2

meant shrubs,

c = 3

meant grass, and

c = 4

meant NVA. Then, the requested weights could be obtained using Equation (7).

w_{c} = \frac{1}{\sum_{i = 1}^{M} {|\{C\}|}_{(i, 1)}}; i = 1,2, \dots, M

(7)

where

| A |

means cardinality of set A, and

M = | L_{1}^{'} |

.

4.2. The MTPI Method for the Pre-Trained Network

To improve the controllability and systematicity of transfer learning, the TPI was proposed to evaluate the transfer learning conditions. As TPI was evaluation index before transfer training, the hardware- and time-consuming TaE process could be avoided, whereas controllability and systematicity of a transfer learning were also improved. So, transfer learning under the condition with maximum TPI (the MTPI method) made the transfer training process more reasonable in conditions of training time and new data quantity.

To implement the TPI in terms of both training speed and final accuracy, Equation (8) was defined to represent it.

T P I (x) = {T P I}_{W K s} (x) / {T P I}_{T P} (x)

(8)

where x = 1, 2, 3, … is the transfer depth (TP) input,

x = 0

means no network layers was frozen, and

x = - 1

means all layers are frozen. Additionally, the x (TP, transfer point) stands for the frozen layer number in transfer training (counting from the input layer), and a layer refers to the activation layer, e.g., relu.

{T P I}_{W K s} (x)

denotes the evaluation of the final accuracy potential, and

{T P I}_{T P} (x)

is the potential evaluation from training time cost. As the value domain of

{T P I}_{W K s} (x)

is from 0 to 1 (0 to 100%), and the value domain of time consumption (number of iterations/level) is greater than 1, the value range of TPI is from 0 to 1. Equation (8) also shows that the better the final precision and the lower the time cost, the larger the TPI is.

As the training time consumption in deep learning is related to the size of the new dataset (TDS, transfer dataset size) used, the TDS needs to be decided according to the feature quantity requiring optimization. Therefore,

{T P I}_{T P}

(TPI from TP perspective) was used to respect represent a evaluate of a transfer learning condition, which was expressed as Equation (9).

{T P I}_{T P} (x) = [(\frac{L S (0) - L S (x)}{L S (0)}) \times N_{l e v e l}] + 1

(9)

where

L S (x)

refers to the size of the partly network layer (number of parameters) that will be perfected. Moreover, the

N_{l e v e l}

is the evaluation classification level, e.g.,

N_{l e v e l} = 10

.

[]

is the rounding function. The value domain of

{T P I}_{L P} (T P)

is

{1,2, . . ., N_{l e v e l}}

.

To determine the TDS, the influences of the structural complexity, task complexity, and data quality of the network were ignored, and the TDS was considered to have a linear relationship with

L S (x)

, expressed as Equation (10).

T D S (x) = k_{t} \times L S (x)

(10)

where

k_{t}

is the scale factor. The

T D S

and

L S

were demonstrated to effectively train with good accuracy in the pre-training, Equation (11).

{T D S}_{p r e} = k_{p r e} \times {L S}_{p r e}

(11)

where

k_{p r e}

means the coefficients assuming a linear relationship between

{T D S}_{p r e}

and

{L S}_{p r e}

, the

{T D S}_{p r e}

is the dataset size in the network’s pre-training, and

{L S}_{p r e} = L S (0)

is the network’s size. Compared to the pre-training process, factors such as task difficulty in transfer training have been drastically reduced, so

k_{t} ≪ k

might be an ideal relationship, e.g., the

T D S

on Y.J. Wang [44], K. He [52], and A. Abdalla [51]. However, in this paper, we consider that

k_{p r e} = k_{t} = {T D S}_{p r e} / {L S}_{p r e}

is the lowest error risk condition for transfer training.

In addition, accuracy is also a key factor to evaluate transfer training, so we consider whether the required transfer features have good classification characteristics. As the WKs can evaluate the combined classification results on multiple classifications, the WKs introduced above were also used in the evaluation of transfer potential. The potential index

{T P I}_{W K s} (x)

(TPI from KWs perspective), which is evaluated from the accuracy perspective, is when the TP is x, which can be expressed as Equation (12).

{T P I}_{W K s} (x) = W K s ({D A}_{n}, x)

(12)

where

{D A}_{n}

refers to the dataset used for transfer learning. The

W K s ({D A}_{n}, x)

refers to the WKs result calculated by Equation (6) on

{D A}_{n}

. The

{T P I}_{W K s} (x)

was expressed as the performance of the transfer training task when the transfer training set was

{D A}_{n}

, with a value range of 0 to 1 (×100%). The larger the

{T P I}_{W K s} (x)

was, the easier to receive a better segmentation effect.

Based on Equations (8)–(10),

T P I (x)

index described the transferable potential of a deep network on a dataset when the TP was x. So, the MTPI method (maximum TPI index method) was proposed by solving the TPI index values of all layers in the pre-trained network, and the

{T P}_{0}

corresponding to the maximum TPI value was the most suitable TP, expressed as Equation (13).

{T P}_{0} = a r g (m a x (T P I))

(13)

In this study, before improving the pre-trained network through transfer training in a DTV environment, the combined performances of the pre-trained network in new datasets (

{D A}_{2, 3, 4}

) were first measured. For datasets that kept outputting well in the pre-trained network (e.g.,

W K s

greater than 0.8), we did not process them any further, and only the datasets with mediocre performance (e.g., WKs less than 0.6 in the evaluation) could be perfected by transferring training. Thereafter, the

T P I

curves (

{T P I C}_{1,2, 3,4}

) for

{D A}_{1,2, 3, 4}

were obtained by combining Equations (9)–(13) (the

{T P I C}_{0}

is the average of

{T P I C}_{1,2, 3,4}

), and the MTPI conditions under the DTV data set were obtained (

{T D S}_{0}

and

{T P}_{0}

).

4.3. The Pre-Trained Networks Transfer to the Dry Season Dataset on MTPI Condition

To verify the effectiveness of the TPI parameter assessment for a transfer learning process, the MTPI conditions (

{T P}_{0}

and

{T D S}_{0}

) derived in Section 4.2 were compared and evaluated with the dataset and pre-trained network presented in this work. For the TP differences, the training accuracy and training loss trends under fixed transfer learning dataset sizes of

{T D S}_{0}

and variations in TP (TP =

{T P}_{0}

,

{T P}_{0}

± 1,

{T P}_{0}

± 5,

{T P}_{0}

± 10, 0, −1) were examined. For the TDS aspect, the training accuracy and training loss variation trends were outputted with a fixed TP of TP0 and a varying size of TDS (e.g., TDS = 10, 20, 50, 100, 200, 500, 1000 (

{T D S}_{0}

), 2000).

To compare the optimization effects of deep networks after the MTPI method of pre-trained networks in DTV conditions, the network outputs after transfer training were performed in MTPI conditions. The results included segmentation mapping of random samples in the

{D A}_{3,4}

. In addition, the overall statistical results (calculated by Equations (1)–(6)) and confusion matrixes of the datasets were also extracted to facilitate the comparison of the final segmentation results for each vegetation category.

5. Conclusions

This paper proposed an evaluation MTPI method for the transfer learning in both data quantity and the feature quantity requiring optimization, which was tested and validated in DTV environments. The main conclusions were as follows.

Four deep learning networks (Seg-net, FCN, Mobile-Net v2, and Res-Net 50) were trained and compared, where Res-Net 50 was found to have the best segmentation results with 93.58% accuracy and 88.14% IoU, which was selected for further transfer learning.
To solve the best transfer learning condition, each layer TPI of the pre-trained network in the four datasets was obtained, and the MTPI condition ( ${T D S}_{0}$ = 1000 and ${T P}_{0}$ = 37) was decided.
The results of comparing transfer learning with varying TDS and TP showed that the obtained MTPI condition performed best (accuracy of 91.56%, and IoU of 84.86% under a 90% reduction in the dataset and a 90% reduction in the iterations) in terms of dataset consumption and time consumption.

The proposed MTPI approach was informative when applying deep learning to transfer in complex similar scenarios, such as unstructured agriculture or geographic environments. Although the proposed method showed acceptable performance for vegetation segmentation, it might need continuing research in two areas (which will probably be in our following research). Firstly, the MTPI method considers the amount of data and features between pre-training and transfer training as equal linear relationships (although the results are acceptable), but the relationship could be much more complex. In addition, the transfer learning process of task complexity, data complexity, and model complexity also affects the TPI in this paper.

Author Contributions

Y.C.: conceptualization, methodology, software, validation, formal analysis, investigation, resources; data curation, writing—original draft preparation, writing—review and editing, visualization; B.Z.: funding acquisition, validation, writing—review and editing, project administration; D.Y., L.C. and L.F.: writing—review and editing, supervision; X.H.: conceptualization, writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the National Natural Science Foundation of China (No: 52222903 and 41977007), and the China Three Gorges Corporation-funded “Research on Soil Improvement, Plant Selection, and Slope Ecological Restoration Techniques in the Arid Hot Valley Area of the Baihetan Hydropower Station” (No: BHT/0869).

Data Availability Statement

Data are available from the first authors upon request.

Conflicts of Interest

The authors declare no conflict of interest.

Nomenclature

Symbols
DTV	Dry Thermal Valley
FCN	Fully Convolutional Networks
GAN	Generative Adversarial Networks
HSV	Hue/Saturation/Value
IoU	Intersection over Union
LD	Layer depths
LSTM	Long Short-Term Memory
MTPI	Maximum Transfer Potential Index method
NVA	Non vegetated area
Res-Net 50	Residual Network
RGB	Red/Green/Blue
Seg-Net	Semantic Segmentation Networks
TaE	Trial-and-error method
TDS	Transfer dataset size
TP	Transfer point/transfer depth
TPI	Transfer Potential Index
UAV(s)	Unmanned aerial vehicle (s)
UAVCSs	UAV-camera systems
WF1	Weighted F1 score
WKs	W-kappa score, weighted Cohen kappa score
WIoU	Weighted IoU
WPA	Weighted pixel accuracy
WRE	Weighted pixel recall

References

Kumar, B.T.; Sambit, B.; Kumar, S.P. Vegetation Extraction from UAV-based Aerial Images through Deep Learning. Comput. Electron. Agric. 2022, 198, 107094. [Google Scholar] [CrossRef]
Yang, M.-D.; Tseng, H.-H.; Hsu, Y.-C.; Tsai, H. Semantic Segmentation Using Deep Learning with Vegetation Indices for Rice Lodging Identification in Multi-date UAV Visible Images. Remote Sens. 2020, 12, 633. [Google Scholar] [CrossRef]
Sadegh, J.; Per, J.; Lars, E.; Jonas, A.; Jonathan, S. Detecting changes in vegetation trends using time series segmentation. Remote Sens. Environ. 2015, 156, 182–195. [Google Scholar]
Allison, L.; Mayank, D.; Ikram, S.T. Assessing alternative methods for unsupervised segmentation of urban vegetation in very high-resolution multispectral aerial imagery. PLoS ONE 2020, 15, e230856. [Google Scholar]
Saheba, B.; Laurence, G.; Bidisha, G. Drone Image Segmentation Using Machine and Deep Learning for Mapping Raised Bog Vegetation Communities. Remote Sens. 2020, 12, 2602. [Google Scholar]
Andreas, K.; Francesc X, P.-B. Deep learning in agriculture: A survey. Comput. Electron. Agric. 2018, 147, 70–90. [Google Scholar]
Fred, H.; Minsuk, K.; Robert, P.; Duen, H.C. Visual Analytics in Deep Learning: An Interrogative Survey for the Next Frontiers. IEEE Trans. Vis. Comput. Graph. 2019, 25, 2674–2693. [Google Scholar]
Eli, B.; Haonan, T.; Chanae, O.; Cranos, M.W. High-throughput image segmentation and machine learning approaches in the plant sciences across multiple scales. Emerg. Top. Life Sci. 2021, 5, 239–248. [Google Scholar]
Mubashiru, L.O.; Rahul, G. YOLOv5-LiNet: A lightweight network for fruits instance segmentation. PLoS ONE 2023, 18, e282297. [Google Scholar]
Jerrin, S.A.; Rekha, C.; Megalan, L.L. Agricultural intrusion detection (AID) based on the internet of things and deep learning with the enhanced lightweight M2M protocol. Soft Comput. 2023. [Google Scholar] [CrossRef]
Hao, S.; Xiao, C.; Jingyi, S.; Rajagopal, R.; Ng, A.Y. Effective Data Fusion with Generalized Vegetation Index: Evidence from Land Cover Segmentation in Agriculture; IEEE: Piscataway, NJ, USA, 2020. [Google Scholar] [CrossRef]
Zhao, S.; Peng, Y.; Liu, J.; Wu, S. Tomato Leaf Disease Diagnosis Based on Improved Convolution Neural Network by Attention Module. Agriculture 2021, 11, 651. [Google Scholar] [CrossRef]
Venkat, B.; Vince, S.; Dhinesh, K.A.; Sahil, D.; Joseph, L.; Prabhakar, K.; Matt, T.; Donna, D. Detecting and Discriminating between Different Types of Bacteria with a Low-Cost Smartphone Based Optical Device and Neural Network Models; SPIE: Bellingham, WA, USA, 2019. [Google Scholar] [CrossRef]
Chen, J.D.; Zhang, D.F.; Suzauddola, M.D.; Nanehkaran, Y.A.; Sun, Y.D. Identification of plant disease images via a squeeze-and-excitation MobileNet model and twice transfer learning. IET Image Process. 2021, 15, 1115–1127. [Google Scholar] [CrossRef]
Lin, N.C.; Zhao, W.F.; Liang, S.H.; Zhong, M.Y. Real-Time Segmentation of Unstructured Environments by Combining Domain Generalization and Attention Mechanisms. Sensors 2023, 23, 6008. [Google Scholar] [CrossRef]
Niu, Z.; Zhong, G.; Yu, H. A review on the attention mechanism of deep learning. Neurocomputing 2021, 452, 48–62. [Google Scholar] [CrossRef]
Marc, R.; Marco, K. Temporal Vegetation Modelling Using Long Short-Term Memory Networks for Crop Identification from Medium-Resolution Multi-Spectral Satellite Images; IEEE: Piscataway, NJ, USA, 2017. [Google Scholar] [CrossRef]
Bashar, A.; Simon, P.; Georgios, L.; Kollias, S. Using Deep Learning to Predict Plant Growth and Yield in Greenhouse Environments. Acta Hortic. 2020, 1296, 425–432. [Google Scholar] [CrossRef]
Aya, F.; Ben, A.A.; Vincent, B.; Imed, R.F. Forecasting vegetation indices from spatio-temporal remotely sensed data using deep learning-based approaches: A systematic literature review. Ecol. Inform. 2022, 68, 101552. [Google Scholar]
Chen, Z.; Liu, H.; Xu, C.; Wu, X.C.; Liang, B.; Cao, J.; Chen, D.L. Modeling vegetation greenness and its climate sensitivity with deep-learning technology. Ecol. Evol. 2021, 11, 7335–7345. [Google Scholar] [CrossRef] [PubMed]
Debabala, S.; Kumar, P.P.; Pradeep, K.G. Image Classification Using Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN): A Review; Springer: Singapore, 2020; pp. 367–381. [Google Scholar]
He, Z.; Yang, Y.; Yang, G.; Zhang, X.Y. Sentiment Analysis of Agricultural Product Ecommerce Review Data Based on Deep Learning; IEEE: Piscataway, NJ, USA, 2020. [Google Scholar] [CrossRef]
Lu, S.; Song, Z.; Chen, W.; Qian, T.T.; Zhang, Y.Y.; Chen, M.; Li, G. Counting Dense Leaves under Natural Environments via an Improved Deep-Learning-Based Object Detection Algorithm. Agriculture 2021, 11, 1003. [Google Scholar] [CrossRef]
Pu, W.; Wang, Z.; Liu, D.; Zhang, Q.L. Optical Remote Sensing Image Cloud Detection with Self-Attention and Spatial Pyramid Pooling Fusion. Remote Sens. 2022, 14, 4312. [Google Scholar] [CrossRef]
Yang, W.; Yang, C.; Hao, Z.; Xie, C.Q.; Li, M.Z. Diagnosis of Plant Cold Damage Based on Hyperspectral Imaging and Convolutional Neural Network. IEEE Access 2019, 7, 118239–118248. [Google Scholar] [CrossRef]
Yu, F.; Bai, J.; Jin, Z.; Zhang, H.G.; Yang, J.X.; Xu, T.Y. Estimating the rice nitrogen nutrition index based on hyperspectral transform technology. Front. Plant Sci. 2023, 14, 1118098. [Google Scholar] [CrossRef] [PubMed]
Xing, H.; Chun, X.; Zhe, F.; Duan, Q.Q.; Zhang, D.W.; Jiang, L.H.; Wei, X.; Hong, D.F.; Li, G.Q.; Zeng, X.H.; et al. Hyperspectral Anomaly Detection Using Deep Learning: A Review. Remote Sens. 2022, 14, 1973. [Google Scholar]
Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Ayad, A.l.D.; Ye, D.; Omran, A.S.; Santamaría, J.; Mohammed, A.F.; Muthana, A.l.A.; Laith, F. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef] [PubMed]
Ben, S.; Robert, G.; Shashank, S.; Surya, G.; Ari, M. Beyond neural scaling laws: Beating power law scaling via data pruning. Adv. Neural Inf. Process. Syst. 2022, 35, 19523–19536. Available online: https://proceedings.neurips.cc/paper_files/paper/2022/file/7b75da9b61eda40fa35453ee5d077df6-Paper-Conference.pdf (accessed on 2 September 2023).
Khudran, A.; Ethan, M.R.; Edward, C.C.; Terrance, E.B. Automated Big Security Text Pruning and Classification; IEEE: Piscataway, NJ, USA, 2016. [Google Scholar] [CrossRef]
Prateek, K.R. Deep CNNs for large scale species classification. arXiv e-prints. [CrossRef]
Yang, S.; Xie, Z.; Peng, H.; Xu, M.; Sun, M.; Li, P. Dataset Pruning: Reducing Training Data by Examining Generalization Influence. arXiv 2022. [Google Scholar] [CrossRef]
Valerio, G.M.; Hanno, S.; Sotirios, A.T. ARIGAN: Synthetic Arabidopsis Plants using Generative Adversarial Network. ICCVW: Venice, Italy, 2017; 2064–2071 . [Google Scholar] [CrossRef]
Zhu, F.; He, M.; Zheng, Z. Data augmentation using improved cDCGAN for plant vigor rating. Comput. Electron. Agric. 2020, 175, 105603. [Google Scholar] [CrossRef]
Zeng, Q.; Ma, X.; Cheng, B.; Zeng, Q.; Ma, X.H.; Cheng, B.P.; Zhou, E.X.; Pang, W. GANs-Based Data Augmentation for Citrus Disease Severity Detection Using Deep Learning. IEEE Access 2020, 8, 172882–172891. [Google Scholar] [CrossRef]
Liu, B.; Tan, C.; Li, S.; He, J.R.; Wang, H.Y. A Data Augmentation Method Based on Generative Adversarial Networks for Grape Leaf Disease Identification. IEEE Access 2020, 8, 102188–102198. [Google Scholar] [CrossRef]
Geetharamani, G.; Arun Pandian, J. Corrigendum to: ‘Identification of plant leaf diseases using a nine-layer deep convolutional neural network’. Comput. Electr. Eng. 2019, 78, 536. [Google Scholar] [CrossRef]
Zhuang, S.; Wang, P.; Jiang, B. Segmentation of Green Vegetation in the Field Using Deep Neural Networks. In Proceedings of the 2018 13th World Congress on Intelligent Control and Automation (WCICA), Changsha, China, 4–8 July 2018. [Google Scholar] [CrossRef]
Alvaro, F.; Sook, Y.; Cheol, K.S.; Sun, D.P. A Robust Deep-Learning-Based Detector for Real-Time Tomato Plant Diseases and Pests Recognition. Sensors 2017, 17, 2022. [Google Scholar] [CrossRef]
Yu, R.; Li, S.; Zhang, B.; Zhang, H.Q. A Deep Transfer Learning Method for Estimating Fractional Vegetation Cover of Sentinel-2 Multispectral Images. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
Ma, Z.; Wang, Y.; Zhang, T.; Wang, H.; Jia, Y.; Gao, R.; Su, Z. Maize leaf disease identification using deep transfer convolutional neural networks. Int. J. Agric. Biol. Eng. 2022, 15, 187–195. [Google Scholar] [CrossRef]
Agbaje, A.O.; Tian, J. Plant Disease Recognition Using Transfer Learning and Evolutionary Algorithms. In Proceedings of the TENCON 2022–2022 IEEE Region 10 Conference (TENCON), Hong Kong, China, 1–4 November 2022. [Google Scholar] [CrossRef]
Mohammadreza, I.; Reza, A.H.; Khaled, R. A Review of Deep Transfer Learning and Recent Advancements. Technologies 2023, 11, 40. [Google Scholar]
Wang, Y.; Li, S.; Teng, F.; Lin, Y.H.; Wang, M.J.; Cai, H.F. Improved Mask R-CNN for Rural Building Roof Type Recognition from UAV High-Resolution Images: A Case Study in Hunan Province, China. Remote Sens. 2022, 14, 265. [Google Scholar] [CrossRef]
Hussain, M.; Bird, J.J.; Faria, D.R. A Study on CNN Transfer Learning for Image Classification, Cham; Springer International Publishing: Cham, Switzerland, 2019; Available online: https://api.semanticscholar.org/CorpusID:52066998 (accessed on 2 September 2023).
Best, N.; Ott, J.; Linstead, E.J. Exploring the efficacy of transfer learning in mining image-based software artifacts. J. Big Data 2020, 7, 59. [Google Scholar] [CrossRef]
Wu, W.; Li, H.; Li, X.; Guo, H.D.; Zhang, L. PolSAR Image Semantic Segmentation Based on Deep Transfer Learning—Realizing Smooth Classification With Small Training Sets. IEEE Geosci. Remote Sens. Lett. 2019, 16, 977–981. [Google Scholar] [CrossRef]
Alexandra, A.; Noramiza, H.; Chikannan, E. Content Based Image Retrieval and Classification Using Speeded-Up Robust Features (SURF) and Grouped Bag-of-Visual-Words (GBoVW); IEEE: Piscataway, NJ, USA, 2013. [Google Scholar] [CrossRef]
Yu, H.; Li, J.; Zhang, L.; Cao, Y.; Yu, X.; Sun, J. Design of lung nodules segmentation and recognition algorithm based on deep learning. BMC Bioinform. 2021, 22, 314. [Google Scholar] [CrossRef]
Amreen, A.; Sweta, J.; Mahesh, G.; Swetha, V. Tomato plant disease detection using transfer learning with C-GAN synthetic images. Comput. Electron. Agric. 2021, 187, 106279. [Google Scholar]
Abdalla, A.; Cen, H.; Wan, L.; Weng, H.; Zhou, W.; He, Y. Fine-tuning convolutional neural network with transfer learning for semantic segmentation of ground-level oilseed rape images in a field with high weed pressure. Comput. Electron. Agric. 2019, 167, 105091. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification; ICCV: Santiago, Chile, 2015. [Google Scholar] [CrossRef]
Wei, Y.; He, Y.; Liu, G.; Jie, J. Notice of Retraction: Features of Eco-Environmental and Ecological Restoration in the Yuanmou Dry-Hot Valley of Yunnan, China; IEEE: Changsha, China, 2011. [Google Scholar] [CrossRef]
Yan, J.; Zhang, Z.; Shen, Z.; Ou, X.K.; Geng, Y.P.; Yang, M.Y. Review of research on the vegetation and environment of dry-hot valleys in Yunnan. Biodivers. Sci. 2016, 24, 462–474. (In Chinese) [Google Scholar]
Vijay, B.; Alex, K.; Roberto, C. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar]
Jonathan, L.; Evan, S.; Trevor, D. Fully Convolutional Networks for Semantic Segmentation; Cornell University Library: Ithaca, NY, USA, 2015. [Google Scholar] [CrossRef]
Muhammed, S.K.; Esat, K.; Fatma, B.Ç.; Ahmet, A. Comparative parotid gland segmentation by using ResNet-18 and MobileNetV2 based DeepLab v3+ architectures from magnetic resonance images. Concurr. Comput. 2023, 35, e7405. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Jian Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar] [CrossRef]

Figure 1. Results of the four pre-trained deep learning networks for the segmentation of four random samples. (a). RGB image. (b). Reference label mapping. (c). Results of a-image processing using Seg-Net. (d). Results of a-image processing using FCN. (e). Results of a-image processing using Mobile-Net v2. (f). Results of a-image processing using Res-net50. No. 1 to 4 represent randomized 4 sample results on Rows 1 to 4.

Figure 2. Confusion matrix results of the four pre-trained networks on the

{D A}_{1}

. (a). Confusion matrix of Seg-Net. (b). Confusion matrix of FCN. (c). Confusion matrix of Mobile-Net v2. (d). Confusion matrix of Res-net50.

Figure 2. Confusion matrix results of the four pre-trained networks on the

{D A}_{1}

. (a). Confusion matrix of Seg-Net. (b). Confusion matrix of FCN. (c). Confusion matrix of Mobile-Net v2. (d). Confusion matrix of Res-net50.

Figure 3. Confusion matrixes results of the pre-trained Res-net 50 on the

{D A}_{2,3, 4}

(a). Confusion matrix on the

{D A}_{2}

. (b). Confusion matrix on the

{D A}_{3}

. (c). Confusion matrix on the

{D A}_{4}

. The red labeling indicates that the pretrained network has high accuracy for NVA classification in all datasets.

Figure 3. Confusion matrixes results of the pre-trained Res-net 50 on the

{D A}_{2,3, 4}

(a). Confusion matrix on the

{D A}_{2}

. (b). Confusion matrix on the

{D A}_{3}

. (c). Confusion matrix on the

{D A}_{4}

. The red labeling indicates that the pretrained network has high accuracy for NVA classification in all datasets.

Figure 4. Feature outputs (TPI mapping) of random samples in

D_{1,2, 3,4}

at different layer depths (TP) of the pre-trained res-net 50. (a). Random samples in

D_{1}

. (b). Random samples in

D_{2}

. (c). Random samples in

D_{3}

. (d). Random samples in

D_{4}

. (The label image would be resized when the resulting dimension size was different from the label).

Figure 4. Feature outputs (TPI mapping) of random samples in

D_{1,2, 3,4}

at different layer depths (TP) of the pre-trained res-net 50. (a). Random samples in

D_{1}

. (b). Random samples in

D_{2}

. (c). Random samples in

D_{3}

. (d). Random samples in

D_{4}

. (The label image would be resized when the resulting dimension size was different from the label).

Figure 5. The relationship between TPI curves and the TP in the pre-training network on

{D A}_{1,2, 3,4}

. TP means TP,

{T P I c}_{0}

is the average of

{T P I c}_{1,2, 3,4}

, and

{T P}_{0}

is the max point of

{T P I c}_{0}

.

Figure 5. The relationship between TPI curves and the TP in the pre-training network on

{D A}_{1,2, 3,4}

. TP means TP,

{T P I c}_{0}

is the average of

{T P I c}_{1,2, 3,4}

, and

{T P}_{0}

is the max point of

{T P I c}_{0}

.

Figure 6. The loss and accuracy trends for the pre-trained network transfer to

{D A}_{3}

, where the TP is around TP =

{T P}_{0}

. (a). Loss of transfer. (b). Accuracy of transfer. (Curves of layer depth = 57 is also equal to the layer depth = −1 because the maximum network layer of Res-net 50 used in this study is 57.).

Figure 6. The loss and accuracy trends for the pre-trained network transfer to

{D A}_{3}

, where the TP is around TP =

{T P}_{0}

. (a). Loss of transfer. (b). Accuracy of transfer. (Curves of layer depth = 57 is also equal to the layer depth = −1 because the maximum network layer of Res-net 50 used in this study is 57.).

Figure 7. Transfer training process and results for different transfer-set sizes. (a). Loss curves in transfer training. (b). Accuracy curves in transfer training.

Figure 8. Sample effect of network segmentation after transfer training. (a). The original RGB image. (b). The manually labeled image. (c). The result segmented by the transferred (MTPI transfer) network. Row 1 result is the randomized sample results in D3. Row 2 result is the randomized sample results in D4.

Figure 9. Confusion matrix of test results for two migrated training networks. (a). Confusion matrix of network transfer to

{D A}_{3}

. (b). Confusion matrix of network transfer to

{D A}_{4}

.

Figure 9. Confusion matrix of test results for two migrated training networks. (a). Confusion matrix of network transfer to

{D A}_{3}

. (b). Confusion matrix of network transfer to

{D A}_{4}

.

Figure 10. Locations of data collected sites and UAVs (unmanned aerial vehicles) equipment for images acquisition. (a). The study site on the map. (b). The DJI Phantom 4 v2 UAV. (c). The DJI Matrice 300 UAV. The Red dotted means the extent of the data collection site in the map.

Figure 11. Several examples of grass, shrubs, trees, and NVA in four datasets (

D_{n}

) (size of 400² (400 × 400) pixels, NVA means non-vegetation area).

Figure 11. Several examples of grass, shrubs, trees, and NVA in four datasets (

D_{n}

) (size of 400² (400 × 400) pixels, NVA means non-vegetation area).

Figure 12. Schematic diagram of the data augmented method. (a). A diagram of original image sample. (b). Y-direction symmetric. (c). X-direction symmetric. (d). X-direction panning. (e). Y-direction panning. (f). The color degree/saturation/brightness random enhancement/reduction. (g). Image scaling up. (h). Image scaling down. (i). Random rotation. (j). Random noise. (k). Random cropping. (l). A random example (

{i m}_{i}

) on

D_{n}

. (m). Several augmentation results of (l).

Figure 12. Schematic diagram of the data augmented method. (a). A diagram of original image sample. (b). Y-direction symmetric. (c). X-direction symmetric. (d). X-direction panning. (e). Y-direction panning. (f). The color degree/saturation/brightness random enhancement/reduction. (g). Image scaling up. (h). Image scaling down. (i). Random rotation. (j). Random noise. (k). Random cropping. (l). A random example (

{i m}_{i}

) on

D_{n}

. (m). Several augmentation results of (l).

Figure 13. The four deep learning networks for vegetation segmentation. (a). The structure of FCN. (b). The structure of Seg-net. (c). The structure of Mobile-Net v2. (d). The structure of Res-net 50. In the figure, the size of each network layer in the networks was labeled as N × N(F), meaning the size (N × N) and the number of filters (F). In addition, the “Input/(SoftMax+) output” means the input layer or the output layer, where a SoftMax layer might omitted before the output layer. The “Crop/dropout/maxP/maxUP” means the cropping layer resize layer, dropout layer, maximum pooling layer, or maximum non-pooling layer. The “Con/DeCon+Normal” is the convolution layer + normalization layer or the convolution layer. The “Con/DeCon+(Normal+) Relu” is the convolution layer + normalization layer + relu activation layers or the deconvolution layer. The “-” represents a structure identical to its upper edge.

Figure 14. Technical flowchart of the MTPI transfer method. Blue, green, red and magenta indicate trees, shrubs, grasses and non-vegetation area in the segmentation results.

Table 1. Segmentation results of the four deep learning deep networks in

{D A}_{1}

.

Table 1. Segmentation results of the four deep learning deep networks in

{D A}_{1}

.

Deep Networks	WPA	WRE	WF1	WKs	WIoU
Seg-Net	0.7122	0.6984	0.7044	0.5859	0.5816
FCN	0.7364	0.7451	0.7277	0.6244	0.6117
Mobile-Net v2	0.8433	0.8397	0.8426	0.7701	0.7390
Res-net-50	0.9358	0.9309	0.9358	0.9355	0.8814

Table 2. Segmentation results of pre-trained network (Res-net 50) on datasets

{D A}_{2, 3, 4}

.

Table 2. Segmentation results of pre-trained network (Res-net 50) on datasets

{D A}_{2, 3, 4}

.

Test Dataset	WPA	WRE	WF1	WKs	WIoU
${D A}_{2}$	0.9352	0.9288	0.9353	0.9032	0.8803
${D A}_{3}$	0.4382	0.4443	0.4833	0.2407	0.2529
${D A}_{4}$	0.1705	0.3464	0.2367	0.0094	0.0984

Table 3. The parameters of UAVs for data acquisition.

UAV	Matrice 300	Phantom 4 v2
Lens FOV	45°	84°
Equivalent focal length	25 mm	35 mm
Positioning mode	GPS + RTK	GPS/GLONASS
Vertical positioning accuracy	±0.1 m	±0.1 m
Horizontal positioning accuracy	±0.1 m	±0.3 m
Image acquisition mode	no hover shooting	no hover shooting
Resolution of location	Less than 50 mm	Less than 50 mm
Flight altitude	<500 m	<500 m
Flight speed	5 m/s	6 m/s
Aperture	Aps-C	f/2.8–f/11
Camera model	FC2008	FCS400
Shutter	1/2000	1/2000
Resolution	3000 × 4000	4800 × 6400
ISO range	100–6400	100–12,800
Photo format	JPEG/RAW	JPEG/DNG

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, Y.; Zhou, B.; Ye, D.; Cui, L.; Feng, L.; Han, X. An Optimization Method of Deep Transfer Learning for Vegetation Segmentation under Rainy and Dry Season Differences in a Dry Thermal Valley. Plants 2023, 12, 3383. https://doi.org/10.3390/plants12193383

AMA Style

Chen Y, Zhou B, Ye D, Cui L, Feng L, Han X. An Optimization Method of Deep Transfer Learning for Vegetation Segmentation under Rainy and Dry Season Differences in a Dry Thermal Valley. Plants. 2023; 12(19):3383. https://doi.org/10.3390/plants12193383

Chicago/Turabian Style

Chen, Yayong, Beibei Zhou, Dapeng Ye, Lei Cui, Lei Feng, and Xiaojie Han. 2023. "An Optimization Method of Deep Transfer Learning for Vegetation Segmentation under Rainy and Dry Season Differences in a Dry Thermal Valley" Plants 12, no. 19: 3383. https://doi.org/10.3390/plants12193383

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Optimization Method of Deep Transfer Learning for Vegetation Segmentation under Rainy and Dry Season Differences in a Dry Thermal Valley

Abstract

1. Introduction

2. Results and Discussions

2.1. The Four Pre-Trained Deep Vegetation Segmentation Networks

2.2. Pre-Trained Networks Performance with Rainy and Dry Season Data

2.3. Transfer Learning Results in TPI Condition

3. Materials

3.1. Data Collection Environment and UAV Equipment

3.2. Data Collection and Dataset

3.3. The Four Deep Segmentation Networks and Evaluation Metrics

4. Methods

4.1. Deep Learning Networks Pre-Trained

4.2. The MTPI Method for the Pre-Trained Network

4.3. The Pre-Trained Networks Transfer to the Dry Season Dataset on MTPI Condition

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI