Machine Learning-Based Estimation of Daily Cropland Evapotranspiration in Diverse Climate Zones

Du, Changmin; Jiang, Shouzheng; Chen, Chuqiang; Guo, Qianyue; He, Qingyan; Zhan, Cun

doi:10.3390/rs16050730

Open AccessArticle

Machine Learning-Based Estimation of Daily Cropland Evapotranspiration in Diverse Climate Zones

by

Changmin Du

¹,

Shouzheng Jiang

¹,

Chuqiang Chen

¹

,

Qianyue Guo

¹,

Qingyan He

^1,2 and

Cun Zhan

^1,*

¹

State Key Laboratory of Hydraulics and Mountain River Engineering & College of Water Resource and Hydro-Power, Sichuan University, Chengdu 610065, China

²

Sichuan Academy of Agricultural Machinery Sciences, Chengdu 610066, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(5), 730; https://doi.org/10.3390/rs16050730

Submission received: 26 December 2023 / Revised: 9 February 2024 / Accepted: 16 February 2024 / Published: 20 February 2024

(This article belongs to the Special Issue Monitoring Water, Vegetation, and Soil Condition in Farmland Ecosystems: Integration of Multi-Source Remote Sensing)

Download

Browse Figures

Versions Notes

Abstract

:

The accurate prediction of cropland evapotranspiration (ET) is of utmost importance for effective irrigation and optimal water resource management. To evaluate the feasibility and accuracy of ET estimation in various climatic conditions using machine learning models, three-, six-, and nine-factor combinations (V3, V6, and V9) were examined based on the data obtained from global cropland eddy flux sites and Moderate Resolution Imaging Spectroradiometer (MODIS) remote sensing data. Four machine learning models, random forest (RF), support vector machine (SVM), extreme gradient boosting (XGB), and backpropagation neural network (BP), were used for this purpose. The input factors included daily mean air temperature (T_a), net radiation (R_n), soil heat flux (G), evaporative fraction (EF), leaf area index (LAI), photosynthetic photon flux density (PPFD), vapor pressure deficit (VPD), wind speed (U), and atmospheric pressure (P). The four machine learning models exhibited significant simulation accuracy across various climate zones, reflected by their global performance indicator (GPI) values ranging from −3.504 to 0.670 for RF, −3.522 to 1.616 for SVM, −3.704 to 0.972 for XGB, and −3.654 to 1.831 for BP. The choice of suitable models and the different input factors varied across different climatic regions. Specifically, in the temperate–continental zone (TCCZ), subtropical–Mediterranean zone (SMCZ), and temperate zone (TCZ), the models of BP_C-V9, SVM_S-V6, and SVM_T-V6 demonstrated the highest simulation accuracy, with average RMSE values of 0.259, 0.373, and 0.333 mm d⁻¹, average MAE values of 0.177, 0.263, and 0.248 mm d⁻¹, average R² values of 0.949, 0.819, and 0.917, and average NSE values of 0.926, 0.778, and 0.899, respectively. In climate zones with a lower average LAI (TCCZ), there was a strong correlation between LAI and ET, making LAI more crucial for ET predictions. Conversely, in climate zones with a higher average LAI (TCZ, SMCZ), the significance of the LAI for ET prediction was reduced. This study recognizes the impact of climate zones on ET simulations and highlights the necessity for region-specific considerations when selecting machine learning models and input factor combinations.

Keywords:

cropland evapotranspiration; machine learning; input factor combinations; climate zone; eddy flux; remote sensing

1. Introduction

Evapotranspiration (ET) plays a critical role in the terrestrial water cycle, accounting for approximately 60% of global precipitation consumption. ET encompasses the process of water vapor transitioning from the Earth’s surface to the atmosphere, which includes evaporation from soil or water bodies, transpiration from vegetation, and evaporation of rainfall intercepted by vegetated surfaces [1]. Crop transpiration is intricately related to physiological activities such as crop growth and the formation of photosynthetic products. Simultaneously, evaporation assists in dissipating the heat generated by the increase in near-surface temperature caused by radiation, thereby maintaining an optimal growth environment within the crop system [2]. Generally, more than 90% of the agricultural water is consumed through ET globally [3]. Accurate estimation of ET is beneficial for real-time monitoring of crop water use status, offering a basis for determining irrigation schedules, enhancing water use efficiency, and even predicting yields within agricultural fields [4,5].

Numerous methodologies have emerged to estimate terrestrial ET across various spatial scales, including hydrological modeling [6], empirical approaches [7], remote sensing inversion [8], and data-driven models. The hydrological method, which relies on the water balance principle for basin or sub-basin ET calculations, encounters challenges owing to uncertainties in the input and output data, model structure, initial conditions, and parameter settings, impacting simulation precision [9]. Empirical, semi-empirical, and physical–mathematical formulations based on meteorological data offer alternatives, with selection contingent on data availability, resulting in varied simulation accuracies across geographical locations [10]. Remote sensing for estimating ET presents distinct advantages in terms of accuracy and spatial resolution [11]; however, its limitation lies in the inability to provide continuous temporal values, which might not meet the temporal demands for irrigation and water resource management. Recently, data-driven models have been widely used for estimating ET owing to their remarkable capability of identifying intricate relationships. Machine learning (ML) techniques, which are characterized by their ability to handle complex relationships without prior knowledge or assumptions, have proven to be highly effective. Among them, the random forest (RF) algorithm has gained significant popularity in agricultural applications, such as land cover classification [12], water resources management [13], and crop yield prediction [14]. Its extensive applicability can be attributed to its exceptional accuracy in both classification and regression tasks, with minimal parameter dependencies, efficient processing capabilities, and the ability to handle overfitting problems [15]. In contrast, the support vector machine (SVM) model possesses a globally optimal solution and exhibits remarkable training efficiency. These qualities endow the SVM with enhanced robustness, efficiency, and reliability [16]. The SVM focuses on establishing functional relationships between ET and explanatory variables without explicitly considering the underlying biophysical mechanisms [17], making it particularly suitable for short-term ET prediction. For example, Liu, et al. [18] achieved a remarkable explanatory power of 71–85% for global ET changes by utilizing only five indicators (average daily temperature, relative humidity, wind speed, solar radiation, and NDVI) as input variables for SVM. XGB is an emerging machine learning algorithm that offers versatility and scalability for modeling small- to medium-sized datasets, making it a popular choice for crop yield predictions because of its flexibility and adaptability [19]. BP is centered around its application in training and testing using input variables, such as temperature, sunshine hours, and wind speed [20]. Its extensive applicability surpasses that of traditional neural networks, as demonstrated in a study by Kumar, et al. [21], in which the BP model outperformed the traditional method in accurately predicting crop ET. Given the widespread application and promising prospects of these four machine learning algorithms for predicting cropland ET, a comprehensive comparative analysis is essential to assess their strengths, weaknesses, and universality across different input factor combinations and climatic conditions.

Obtaining parameters for the underlying surface of general croplands is challenging, and the availability of reliable meteorological data is often limited. Consequently, researchers have consistently aimed to increase the accuracy of ET predictions using a reduced number of variables. Previous studies have revealed that the dominant factors driving ET vary across different climatic regions, leading to differences in the ET simulation accuracy under various combinations of input factors. Pagano, et al. [22] compared the performance of multi-layer perceptron (MLP) and random forest (RF) in predicting daily ET in a citrus orchard typical of the Mediterranean ecosystem, highlighting the substantial influence of soil water content (SWC) and solar radiation (Rs) on ET prediction. Remarkably, even with a reduction in the number of input features to just four and a judicious selection of feature combinations, the machine learning models still achieved high accuracy in predicting ET. Chen, et al. [23] utilized the fuzzy rough set algorithm (BSFL-FRSA) to discern both individual and multifactorial determinants of ET in evergreen needleleaf forests across three distinct climate zones in North America: the Mediterranean, warm summer continental, and subarctic regions. The study revealed the predominant factors driving ET and the most crucial combinations of multiple factors. Agricultural ecosystems are predominantly found in the boreal, temperate, subtropical–Mediterranean, and temperate–continental climatic regions [24]. Hence, investigating the primary factors contributing to ET prediction within these climate zones and selecting optimal combinations of input factors can offer valuable insights and facilitate accurate ET assessments.

The objectives of this study were to (1) identify the important input factors deriving daily crop ET in different climatic regions; (2) explore the applicability of four machine learning models, RF, SVM, XGB, and BP, in predicting daily crop ET; and (3) evaluate the accuracy of these models using specific combinations of three, six, and nine input factors and recommend an optimal model for each climatic region. This study provides a convenient method to accurately simulate ET in farmlands across diverse climatic zones.

2. Materials and Methods

2.1. Description of the Flux Sites

In this study, 15 eddy covariance (EC) cropland flux towers located in three different climate zones were carefully selected. These sites included representative stations from the temperate–continental climate zone (TCCZ), featuring US-ARM, US-CRT, US-Ne1, US-Ne2, and US-Ne3. The subtropical–Mediterranean climate zone (SMCZ) included representative stations IT-BCi, IT-CA2, US-TW2, US-TW3, and US-TW, whereas the temperate climate zone (TCZ) comprised representative stations BE-Lon, CH-Oe2, DE-Geb, DE-Kli, and FR-Gri. Detailed information for each site is presented in Table 1.

2.2. Flux and Auxiliary

This study was based on the analysis of daily EC flux and meteorological information extracted from the FLUXNET Tier 2 dataset (http://fluxnet.fluxdata.org, accessed on 10 November 2023), which includes variables such as R_n (W m⁻²), T_a (°C), soil temperature (T_s, °C), VPD (hPa), sensible heat flux (H, W m⁻²), and latent heat flux (LE, W m⁻²). According to Allen, et al. [25], daily ET was derived from LE using the latent heat of vaporization as a function of T_a by

E T = \frac{L E}{2.501 - (2.361 \times 10^{- 3}) \times T_{a}}

. The soil water content (SWC) was measured at various depths at diverse sites, potentially failing to adequately represent the wetness or dryness of the soil, as soil properties vary across different sites. Here, we utilized the evaporative fraction (EF) to represent the degree of ground dryness and wetness [26,27] because an increase in energy allocation to evaporating water implies a greater potential water supply from the soil. The evaporative fraction (EF) is calculated as follows:

E F = \frac{L E}{L E + H}

(1)

where LE is the latent heat flux and H is the sensible heat flux.

The primary source of data for capturing crop phenology information is the leaf area index (LAI). In our study, LAI data were acquired from the MODIS remote sensing product MODIS 15A2H (https://lpdaac.usgs.gov/products/mod15a2hv006/, accessed on 15 October 2023) with an 8-day interval and 500 × 500 m spatial resolution. The original LAI data were preprocessed using the TIMESAT software (version 3.3) to attenuate peak values and eliminate transient, unrealistic fluctuations due to factors such as cloud interference or the presence of snow and ice on the ground [28]. Any gaps in the LAI values were filled using linear interpolation based on available nearby data points over time. Subsequently, cubic spline interpolation was applied to interpolate the 8-day LAI data, generating daily data that aligned with the requirements of our modeling inputs. This approach minimized data redundancy and ensured a consistent and high-quality dataset for our research. The statistical parameters of the environmental variables from the flux tower and LAI data across the different climatic regions are presented in Table 2. Observations illustrate that the mean T_a varied from 11.43 °C in TCCZ to 16.19 °C in SMCZ; the mean R_n ranged from 86.37 W m⁻² in TCZ to 113.87 W m⁻² in SMCZ; the mean ET fluctuated between 1.62 mm d⁻¹ in TCZ and 2.32 mm d⁻¹ in SMCZ; the mean LAI changed from 0.54 m²/m² in TCCZ to 1.91 m²/m² in TCZ. Additional statistical characteristics of each variable are comprehensively provided.

2.3. Machine Learning Models

2.3.1. Random Forest (RF)

A flowchart of the implementation of the applied machine learning models is shown in Figure 1. Figure 2 shows a flowchart of the four machine learning algorithms. RF is a supervised ensemble learning algorithm that was initially proposed by [29]. Its primary objective is to generate accurate predictions without overfitting the data. The RF operates as a combination of tree predictors, with each tree depending on the values of the random vectors sampled independently and from the same distribution for all trees within the forest [13]. Previous studies have shown that RF outperforms conventional approaches in estimating Eto, achieving a significant reduction in the obtained error by approximately half [30]. After training, predictions for the unseen samples

x

can be made by averaging the predictions from all individual regression trees on

x

, as follows:

\hat{f} = \frac{1}{B} \sum_{b = 1}^{B} f_{b} (x)

(2)

where

B

is the number of trees,

f_{b}

is the function obtained from training the b-th tree, and

\hat{f}

is the final prediction value.

2.3.2. Support Vector Machine (SVM)

The SVM is recognized as a classical data-driven technique known for its robust ability to handle complex non-linear relationships between input and output variables [31]. Owing to its strong capacity to solve intricate nonlinear problems, the SVM has been widely applied to simulate both ET₀ [32,33] and ET [34,35]. Additionally, research suggests that using the radial basis function (RBF) to transform the feature space yields highly accurate estimation results [36]. Consequently, in this study, an SVM model based on the RBF function was used to predict the ET. The approximated function is expressed as follows:

y = w^{T} x + b, x, w \in R^{M}

(3)

where

M

is the dimension of

x

,

w

is the weight vector, and

b

is a bias term. To determine the optimum

w

and

b

, the target of the optimization problem can be expressed as follows:

\underset{w, b}{m i n} = \frac{1}{2} \sqrt{w w^{T}}

(4)

where

w

denotes the normal vector of the hyperplane.

2.3.3. Extreme Gradient Boosting (XGB)

The XGB model is an enhanced version of the gradient boosting machines (GBMs) proposed by [37]. Originating from the concept of “boosting”, the XGB model combines predictions from a series of “weak” learners to create a “strong” learner using an additive training process [38]. Recent studies have indicated that the XGB model is a promising alternative method for estimating the daily ET₀ [39]. However, the specific performance of the direct application of XGB for simulating ET remains unclear. The general function for prediction at step t is as follows:

f_{i}^{t} = \sum_{k = 1}^{t} f_{k} (x_{i}) = f_{i}^{(t - 1)} + f_{t} (x_{i})

(5)

where

x_{i}

is the input variable, and

f_{t} (x_{i})

and

f_{i}^{t}

are the learner and predictions at step

t

, respectively.

2.3.4. Backpropagation Neural Network (BP)

An artificial neural network (ANN) is a well-established supervised learning method known for its exceptional capability to extract nonlinear features from gathered data, making it a widely used modeling tool [40]. As a result, ANNs have found extensive applications in estimating ET₀ [41,42]. Backpropagation, a gradient descent method, is commonly used among various algorithms proposed for training the ANN method. Backpropagation involves calculating the gradient of the error with respect to the weights for a given input by propagating the error backward from the output layer to the hidden layer and then to the input layer [43]. The backpropagation algorithm was adopted in this study, and a flowchart of the backpropagation algorithm is illustrated in Figure 2. During the error backpropagation process, the weights and biases are modified at each iteration by minimizing an error metric that quantifies the disparity between the produced output and the desired output, which can be expressed as follows:

W_{n + 1} = W_{n} + Δ W

(6)

where

W_{n + 1}

and

W_{n}

represent the weight matrix during iterations

n

and

n + 1

, respectively, in the iterative training process.

Δ W

denotes the adjusting weight matrix responsible for controlling the convergence rate and the computational complexity.

The error minimization process was repeated until a satisfactory convergence criterion was obtained:

E = \sum_{P} {(y_{i} - t_{i})}^{2}

(7)

where

y_{i}

is the final output of the ANN model, and

t_{i}

is the measured output.

2.3.5. Model Development

This study develops four machine learning models (RF, SVM, XGB, and BP) to simulate and predict daily cropland ET using daily EC flux and meteorological information. By inputting different variable combinations, the models’ predictive potentials are explored and variable combinations are compared. The dataset was split into training and testing sets. At each flux tower site, 80% of the time series data was used for model training, and the remaining 20% for testing, allowing for an investigation into the predictive performance of machine learning models at specific sites and, further, within specific climatic regions. All machine learning model development and statistical computations were conducted using R version 4.0.5 [44]. Four machine learning models utilized in this study, along with their corresponding hyperparameters, can be found in the Supplementary Materials.

2.4. Evaluating Indicators

The root mean square error (RMSE) and mean absolute error (MAE) were used to check the accuracy of the models, whereas the determination coefficient (R²) and the Nash–Sutcliffe efficiency coefficient (NSE) are measurements of generalizability when comparing the performances of the four machine learning models under different input combinations [45,46,47].

R M S E = \sqrt{\frac{\sum_{i = 1}^{n} {(x_{i} - y_{i})}^{2}}{n}}

(8)

M A E = \frac{\sum_{i = 1}^{n} |(x_{i} - y_{i})|}{n}

(9)

R^{2} = \frac{\sum_{i = 1}^{n} {(x_{i} - {\bar{x}}_{i})}^{2} {(y_{i} - {\bar{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(x_{i} - {\bar{x}}_{i})}^{2} \sum_{i = 1}^{n} {(y_{i} - {\bar{y}}_{i})}^{2}}

(10)

N S E = 1 - \frac{\sum_{i = 1}^{n} {(x_{i} - y_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - {\bar{y}}_{i})}^{2}}

(11)

where

x_{i}

is the predicted value of ET,

y_{i}

is the measured value of ET,

{\bar{x}}_{i}

and

{\bar{y}}_{i}

are the corresponding average values of

x_{i} and y_{i}

, subscript

i

refers to the number of datasets,

n

is the length of the dataset. Larger R² and NSE values, along with smaller RMSE and MAE values, signify a better fit in the context of modeling. In addition, a global performance indicator (GPI), which normalizes the four different metrics as one, was used to comprehensively evaluate the model performance [48,49].

G P I_{i} = \sum_{j = 1}^{4} a_{j} (g_{j} - y_{i j})

(12)

where

a_{j}

is a coefficient (1 for RMSE and MAE and −1 for R² and NSE),

g_{j}

represents the median of the scaled values of statistical indicator

j

, and

y_{i j}

represents the scaled value of statistical indicator

j

for model

i

. A higher GPI value implies superior model performance.

3. Results

3.1. The Overall Performance of Four Machine Learning Models in Simulating ET

In certain regions, simultaneously obtaining a complete set of input data for ET simulations is a challenge. To reduce the number of input variables while ensuring the accuracy of the ET simulation, we selected variables that demonstrated stronger correlations with ET for both three-factor and six-factor input combinations. The initial correlation coefficients were computed independently using the input factor data from various climate stations and subsequently merging the data from all stations to produce the overall correlation coefficients, as shown in Table 3. The outcomes of this comprehensive data integration revealed R_n, PPFD, and EF to be strongly correlated with ET, thus forming a three-factor input combination. Additionally, T_a, LAI, and VPD were strongly correlated with ET, extending this input combination to six factors. Using all available input data enabled the creation of a nine-factor input combination, as detailed in Table 4.

The simulation performances of the four models at different represented sites across various climate zones are illustrated in Figure 3. The sites were chosen based on the availability of the highest volume of valid data within the respective climate zone. The consistent results between the simulated and actual ET demonstrated their capacity to simulate ET with three different input factor combinations. Figure 4 presents the simulation results of ET using the RF, SVM, XGB, and BP models with three-factor input combinations (V3, V6, and V9). For the RF-V3, SVM-V3, XGB-V3, and BP-V3 models, the average RMSE values were 0.738, 0.774, 0.780, and 0.892 mm d⁻¹, respectively. The average MAE values were 0.547, 0.573, 0.576, and 0.627 mm d⁻¹, respectively. Correspondingly, the average R² values were 0.618, 0.622, 0.597, and 0.570, whereas the average NSE values were 0.378, 0.256, 0.272, and −0.161, respectively. According to the results presented in Table 5, RF-V3 exhibited a higher prediction accuracy for ET than the other models, with a GPI value of −2.376. The same pattern of results emerged with the six- and nine-factor input combinations, in which the GPI values for SVM-V6 and SVM-V9 were 0.548 and 0.561, respectively, demonstrating their superior accuracy in simulating ET among the six- and nine-factor input combinations.

The overall simulation results for ET across various sites under three different input combinations are presented in Table 6. Considering all input factor combinations, the average RMSE values for the RF, SVM, XGB, and BP models were 0.527, 0.488, 0.545, and 0.551 mm d⁻¹, respectively. The average MAE values were 0.388, 0.356, 0.396, and 0.390 mm/d, respectively. Concurrently, the average R² values were 0.771, 0.804, 0.749, and 0.768, whereas the average NSE values were 0.633, 0.659, 0.588, and 0.491, respectively. Consequently, the SVM model outperformed the other models in terms of the overall ET simulation across all climate zones.

3.2. Performance of Four Machine Learning Models in Simulating ET in Different Climatic Regions

Table 3 presents the results of the correlation analysis of ET using data collected from stations in three distinct climate zones. Using the same variable grouping described in Section 3.1, three input factor combinations for these climate zones were derived, and four machine learning models were used to simulate ET in the TCCZ, SMCZ, and TCZ climate zones, as shown in Table 7. Across specific input factor combinations in each climate zone, the SVM model consistently demonstrated superior accuracy in the ET simulation, with average RMSE values of 0.312, 0.387, and 0.460 mm d⁻¹ for TCCZ, SMCZ, and TCZ, respectively. The average MAE values were 0.218, 0.275, and 0.332 mm d⁻¹, respectively. The average R² values were 0.926, 0.824, and 0.833, and the average NSE values were 0.905, 0.774, and 0.783, respectively (Table 8).

Based on the statistical parameters listed in Table 8, the optimal models were identified as SVM_C-V3, BP_C-V6, and BP_C-V9 for three-, six-, and nine-factor combinations in the TCCZ (Table 9), respectively. In contrast, in the SMCZ (TCZ) climate zone, the optimal models were determined to be SVM_S-V3 (SVM_T-V3), SVM_S-V6 (SVM_T-V6), and SVM_S-V9 (SVM_T-V9) for three-, six-, and nine-factor combinations (Table 9), respectively. In summary, the BP model demonstrated superior ET simulation with six- and nine-factor input combinations in the TCCZ climate zone. In all other cases, the SVM model consistently delivered higher accuracy (Figure 5).

3.3. Runtime Analysis of Four Machine Learning Models in ET Simulation

In addition to seeking higher accuracy, the computational runtime of machine learning models is also a crucial consideration in ET simulations. As shown in Figure 6, the runtime for the RF, SVM, XGB, and BP models ranged from 6.77 to 12.83 s, 1.24 to 1.85 s, 0.37 to 0.46 s, and 4.33 to 7.07 s using data merged from all stations, respectively. Using data exclusively from stations within distinct climate zones, the runtime for the RF, SVM, XGB, and BP models spans from 7.45 to 12.96 s, 1.72 to 2.46 s, 1.60 to 1.87 s, and 3.92 to 6.70 s, respectively. The differentiation of climate zones for the input factor combinations had a negligible impact on the runtime performance of the models. It is noteworthy that, among the four models, both the RF and XGB models showed an increasing trend in runtime with an increase in the number of input factors. This trend was particularly pronounced in the RF model. Conversely, the SVM and BP models exhibited a fluctuating trend in the runtime as the number of input factors increased. In the comprehensive assessment of all models and their respective input factor combinations, RF*-V9 boasts the longest runtime at 12.96 s, whereas XGB-V3 displayed the shortest runtime at 0.37 s.

4. Discussion

4.1. Importance of Input Factors for Simulating Evapotranspiration

Based on a correlation analysis of various input variables with ET, R_n was identified as the most crucial input factor for simulating ET. Subsequently, photosynthetic photon flux density (PPFD) has emerged as the next significant factor, both of which are intricately related to energy absorption. The energy required for water vapor evaporation is derived from the radiation, thus establishing R_n as a pivotal driver of ET, especially in non-moisture-restricted areas [50]. In addition, numerous studies have proved that variations in light intensity can affect plant photosynthesis, leaf area morphology, and radiation absorption. Light also directly affects the stomatal opening, which is an important channel for water vapor diffusion [51].

In addition to the energy term, our study found that LAI played an important role in ET simulation, and there was a stronger correlation between LAI and ET in the TCCZ than that in the TCZ or SMCZ, ranking at the top (Table 4). The LAI influences the ground energy reception by affecting the sensible heat flux and radiation. This impact is particularly pronounced when the LAI is low and diminishes as the LAI surpasses a certain threshold. Current studies have found that the relationships between environmental factors and ET were mediated by leaf area [52], and the regulatory effect on ET was significantly different before and after an LAI close to 1 (1.2~1.5 m² m⁻²). When the canopy cover is full, the increase in canopy cover is not apparently equal to the increase in LAI; therefore, the intercepted energy does not increase significantly. Therefore, beyond this threshold, changes in LAI have a diminishing effect on ET [53]. However, in TCZ and TCCZ, the correlation was relatively weaker. By comparing the average LAI values for different climatic zones, as displayed in Table 2, it was found that the LAI in the TCZ was greater than that in the SMCZ and TCCZ, further confirming the mechanism of the impact of LAI on ET. As an important component of the water cycle, ET involves numerous complex energy exchanges. In particular, evaporation is the movement of water into the air and can readily lead to changes in the air temperature. Conversely, variations in the daily average temperature of agricultural fields can reflect the intensity and rate of energy exchange in the atmosphere. Thus, the daily mean air temperature (T_a) also demonstrates a strong correlation with ET [54].

Although EF was considered a crucial factor for quantifying surface water deficits and the water cycle, and the estimation of ET based on a specific daytime EF is considered a favorable method [27,55], EF did not demonstrate the same universally strong correlation across all climatic regions as R_n and PPFD in the present study. Further investigation of previous research has revealed that EF demonstrates higher sensitivity to land surface moisture conditions in arid regions, whereas it is less sensitive in relatively humid areas [56]. On one hand, this was probably due to the generally humid climate in agricultural areas; on the other hand, it may be influenced by irrigation practices in agricultural areas, which can affect the land surface moisture conditions at flux measurement sites. Therefore, when considering the number of required input factors, these relatively more important factors should be prioritized.

4.2. Optimal Input Factor Combinations and Machine Learning Models for Simulating Evapotranspiration

The four machine learning models used in this study demonstrated relatively good simulation accuracy. After obtaining the input factor combinations from the correlation analysis of ET using data from sites within each respective climate zone, the simulation accuracy of the three- and six-factor combinations of each model was significantly improved, indicating that it is meaningful to consider the differences and impacts of climate zones when simulating cropland daily ET. Given the high simulation accuracy across all the models, each model had suitable scenarios and characteristics. RF models often achieve better simulation accuracy than other models when trained with fewer input variables. However, they also had the longest execution times among the four models [57]. When conducting ET simulations with fewer input variables and when the importance of the time cost is low or not considered, the RF model can be prioritized. The SVM consistently demonstrated better modeling performance for ET than the other three models across most combinations of input variables and various climate zones. In the comparison without distinguishing climate zones and input factor combinations, SVM-V6 and SVM-V9 were the optimal six-factor and nine-factor input combination models. In the comparison where input factor combinations were divided by climate zones, overall, the combined performance of SVM models with three different input factor combinations (SVM*-V3 and SVM*-V6) was the best. Additionally, SVM models exhibited faster single-model runtimes (significantly lower than those of the RF and BP models). The XGB model has a notable advantage in terms of computational time and efficiency over other models [37], which makes it more suitable for addressing real-time prediction problems, even though the simulation accuracy of the XGB model for ET is not particularly outstanding among these machine learning models. However, the simulation accuracy of the BP model is unstable across different climatic regions, and its generalization ability is relatively average [58], particularly when dealing with very limited or extremely large datasets, and the performance of the BP model is not as strong as that of other models. In some cases, the backpropagation algorithm may become stuck at a local optimum [59]. Additionally, the BP model had a relatively long single-model runtime (only slightly faster than that of the RF model). However, when the dataset size is moderate, the BP model may achieve high accuracy. Therefore, to attain excellent simulation accuracy with the BP model, a substantial amount of data is required to pre-determine the optimal local input variables for each site, which may require a significant amount of time and effort. Considering the simulation accuracy of the model and the runtime under various conditions, this study suggests that the SVM model is the preferred choice for simulating daily cropland ET.

Considering three different combinations of input factors, we observed differences in the adaptability of input factor combinations across climatic zones. By comparing the performance of machine learning models using the three-factor combination (V3) in different climatic zones, three out of four of the models exhibited better performance in the SMCZ. Similar results were also observed with the nine-factor combination (V9), which showed better performance in the TCCZ (Table 9). Additionally, the four machine learning models generally performed well in both V6 and V9 input factor combinations. This indicates that the predictive accuracy of the models improved with the increase in input factors. However, the simulation accuracy of the V9 input factor combination did not show significant superiority over that of the V6 input factor combination, and at some sites, it performed worse for certain site data. Therefore, considering that the V9 input factor combination requires 50% more input factors than the V6 input factor combination while achieving similar simulation accuracy, this study suggests that V6 is the most economical input factor combination for situations with limited meteorological data.

4.3. Uncertainties

Estimating uncertainty for EC data is challenging, and the presence of missing data in the dataset can lead to discontinuities in the data time series, potentially increasing the difficulty of accurately predicting ET using machine learning models. Considering the complexity of relationships between variables, future research may explore the application of some deep learning models for agricultural evapotranspiration prediction. Additionally, interpolating MODIS LAI from 8-day periods to daily using cubic spline interpolation may introduce uncertainty into the models. However, the daily scale LAI interpolated from 8-day data may be beneficial to improve the simulation accuracy of the machine learning used in this study. The daily scale LAI is consistent with the input meteorological data on the time scale, and it can better quantify the impact of vegetation change on ET, especially in the period when the vegetation changed dramatically (generally the rapid development stage of crops). Directly conducting correlation analysis on the original time series dataset can result in non-independent impacts of input variables on ET (ideally, analyzing the impact of a certain variable should ensure that other variables remain constant, which is not achievable in reality). This can lead to larger underestimations of variables (such as U and P) ranked lower in correlation analysis, and the uncertainty in flux tower measurement data may exacerbate this underestimation. However, its impact on our selection of three-factor and six-factor input variables (V3 and V6) is minimal. As far as the representativeness of the EC sites selected in this study, we have to admit that not all EC flux sites in different climate zones could be included due to limitations of accessibility and openness of data. In the future, extending field ET predictions to flux tower sites in other climatic zones beyond those mentioned in this study (e.g., the Boreal climatic zone) could be considered. Due to the limited representation of site numbers in this study, not all relevant climatic zones were investigated, although most flux tower station data located in agricultural areas were included.

5. Conclusions

This study used meteorological and remote sensing data from diverse agricultural sites in distinct climatic regions to simulate ET using four machine learning models with three different input combinations. The key findings are summarized as follows: (1) R_n, PPFD, LAI, EF, and T_a emerged as pivotal factors influencing daily ET in agricultural areas, and they all exhibited a relatively strong correlation with ET across various climate zones; (2) all four machine learning models yielded satisfactory simulation performance, with the SVM model demonstrating the best simulation performance, particularly when considering the influence of climate zones on ET simulation; (3) the predictive ET accuracy of three-factor combinations (V3) was improved with the inclusion of more input factors. However, considering both predictive accuracy and input factor efficiency, the V6 input factor combination is recommended as the preferred choice; (4) in climate zones with a lower average LAI, LAI was more crucial for ET predictions than in climate zones with a higher average LAI (TCZ, SMCZ), highlighting the need for region-specific considerations when selecting machine learning models and input factor combinations.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/rs16050730/s1, Table S1: Site codes and corresponding full names of the 15 crop flux tower sites used in this study.; Table S2: Hyperparameters of Machine Learning Models Used in This Study; Table S3: The average and standard deviation of the Pearson correlation coefficients between input factors and evapotranspiration (ET). At each site, Pearson correlation coefficients were independently calculated.

Author Contributions

Writing—original draft preparation and visualization, C.D.; conceptualization and methodology, S.J.; software and validation, C.C.; software and data curation, Q.G.; resources, Q.H.; formal analysis, investigation, and writing—review and editing, C.Z.; funding acquisition, S.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (52279041, 52309055), Fundamental Research Funds for the Central Universities (YJ202259), and the Sichuan Science and Technology Program (2023YFN0024, 2023NZZJ0015).

Data Availability Statement

All data are available upon request.

Acknowledgments

We would like to thank the FLUXNET community, including AmeriFlux, AsiaFlux, and the European Fluxes Database and all the scientists and technicians maintaining the flux site management and providing crop information. We are also grateful to the Distributed Archive Center of Oak Ridge National Laboratory and the Earth Observing System Data for making MODIS data available.

Conflicts of Interest

The authors declare no conflicts of interest.

Nomenclature

T_a	daily mean air temperature (°C)
R_n	net radiation (MJ/m² d)
G	soil heat flux (MJ/m² d)
EF	evaporative fraction (W m⁻²/W m⁻²)
LAI	leaf area index (m²/m²)
PPFD	photosynthetic photon flux density (µmol/m² s)
VPD	vapor pressure deficit (kPa)
U	wind speed (m/s)
P	atmospheric pressure (kPa)
V3, V6, V9	three, six, and nine input factor combinations
RMSE	root mean square error
R²	determination coefficient
MAE	mean absolute error
GPI	global performance indicator
NSE	Nash–Sutcliffe efficiency coefficient
RF	random forest
SVM	support vector machine
XGB	extreme gradient boosting
BP	backpropagation neural network
TCCZ	temperate–continental climate zone
SMCZ	subtropical–Mediterranean climate zone
TCZ	temperate climate zone

References

Yang, Y.; Roderick, M.L.; Guo, H.; Miralles, D.G.; Zhang, L.; Fatichi, S.; Luo, X.; Zhang, Y.; McVicar, T.R.; Tu, Z. Evapotranspiration on a greening Earth. Nat. Rev. Earth Environ. 2023, 4, 626–641. [Google Scholar] [CrossRef]
Sivakumar, M.V.K.; Stefanski, R. Climate and Land Degradation—An Overview. In Climate and Land Degradation; Sivakumar, M.V.K., Ndiang’ui, N., Eds.; Springer: Berlin/Heidelberg, Germany, 2007; pp. 105–135. [Google Scholar]
McCabe, M.F.; Wood, E.F. Scale influences on the remote estimation of evapotranspiration using multiple satellite sensors. Remote Sens. Environ. 2006, 105, 271–285. [Google Scholar] [CrossRef]
Foster, T.; Brozović, N. Simulating crop-water production functions using crop growth models to support water policy assessments. Ecol. Econ. 2018, 152, 9–21. [Google Scholar] [CrossRef]
Igbadun, H.E.; Tarimo, A.K.; Salim, B.A.; Mahoo, H.F. Evaluation of selected crop water production functions for an irrigated maize crop. Agric. Water Manag. 2007, 94, 1–10. [Google Scholar] [CrossRef]
Dong, Q.; Zhan, C.; Wang, H.; Wang, F.; Zhu, M. A review on evapotranspiration data assimilation based on hydrological models. J. Geog. Sci. 2016, 26, 230–242. [Google Scholar] [CrossRef]
Srivastava, R.; Panda, R.; Halder, D. Effective crop evapotranspiration measurement using time-domain reflectometry technique in a sub-humid region. Theor. Appl. Climatol. 2017, 129, 1211–1225. [Google Scholar] [CrossRef]
Hirschi, M.; Michel, D.; Lehner, I.; Seneviratne, S.I. A site-level comparison of lysimeter and eddy covariance flux measurements of evapotranspiration. Hydrol. Earth Syst. Sci. 2017, 21, 1809–1825. [Google Scholar] [CrossRef]
Renard, B.; Kavetski, D.; Kuczera, G.; Thyer, M.; Franks, S.W. Understanding predictive uncertainty in hydrologic modeling: The challenge of identifying input and structural errors. Water Resour. Res. 2010, 46. [Google Scholar] [CrossRef]
de Carvalho Alves, M.; de Carvalho, L.G.; Vianello, R.L.; Sediyama, G.C.; de Oliveira, M.S.; de Sá Junior, A. Geostatistical improvements of evapotranspiration spatial information using satellite land surface and weather stations data. Theor. Appl. Climatol. 2013, 113, 155–174. [Google Scholar] [CrossRef]
Li, Z.-L.; Tang, R.; Wan, Z.; Bi, Y.; Zhou, C.; Tang, B.; Yan, G.; Zhang, X. A review of current methodologies for regional evapotranspiration estimation from remotely sensed data. Sensors 2009, 9, 3801–3853. [Google Scholar] [CrossRef]
Duro, D.C.; Franklin, S.E.; Dubé, M.G. A comparison of pixel-based and object-based image analysis with selected machine learning algorithms for the classification of agricultural landscapes using SPOT-5 HRG imagery. Remote Sens. Environ. 2012, 118, 259–272. [Google Scholar] [CrossRef]
Feng, Y.; Cui, N.; Gong, D.; Zhang, Q.; Zhao, L. Evaluation of random forests and generalized regression neural networks for daily reference evapotranspiration modelling. Agric. Water Manag. 2017, 193, 163–173. [Google Scholar] [CrossRef]
Fukuda, S.; Spreer, W.; Yasunaga, E.; Yuge, K.; Sardsud, V.; Müller, J. Random Forests modelling for the estimation of mango (Mangifera indica L. cv. Chok Anan) fruit yields under different irrigation regimes. Agric. Water Manag. 2013, 116, 142–150. [Google Scholar] [CrossRef]
Prasad, N.; Patel, N.; Danodia, A. Crop yield prediction in cotton for regional level using random forest approach. Spatial Inf. Res. 2021, 29, 195–206. [Google Scholar] [CrossRef]
Hipni, A.; El-shafie, A.; Najah, A.; Karim, O.A.; Hussain, A.; Mukhlisin, M. Daily forecasting of dam water levels: Comparing a support vector machine (SVM) model with adaptive neuro fuzzy inference system (ANFIS). Water Resour. Manag. 2013, 27, 3803–3823. [Google Scholar] [CrossRef]
Dou, X.; Yang, Y. Modeling evapotranspiration response to climatic forcings using data-driven techniques in grassland ecosystems. Adv. Meteorol. 2018, 2018, 1824317. [Google Scholar] [CrossRef]
Liu, M.; Tang, R.; Li, Z.-L.; Yao, Y.; Yan, G. Global land surface evapotranspiration estimation from meteorological and satellite data using the support vector machine and semiempirical algorithm. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 513–521. [Google Scholar] [CrossRef]
Pathy, A.; Meher, S.; Balasubramanian, P. Predicting algal biochar yield using eXtreme Gradient Boosting (XGB) algorithm of machine learning methods. Algal Res. 2020, 50, 102006. [Google Scholar] [CrossRef]
Liu, X.; Kang, S.; Li, F. Simulation of artificial neural network model for trunk sap flow of Pyrus pyrifolia and its comparison with multiple-linear regression. Agric. Water Manag. 2009, 96, 939–945. [Google Scholar] [CrossRef]
Kumar, M.; Raghuwanshi, N.; Singh, R. Artificial neural networks approach in evapotranspiration modeling: A review. Irrig. Sci. 2011, 29, 11–25. [Google Scholar] [CrossRef]
Pagano, A.; Amato, F.; Ippolito, M.; De Caro, D.; Croce, D.; Motisi, A.; Provenzano, G.; Tinnirello, I. Machine learning models to predict daily actual evapotranspiration of citrus orchards under regulated deficit irrigation. Ecol. Inf. 2023, 76, 102133. [Google Scholar] [CrossRef]
Chen, Y.; Xue, Y.; Hu, Y. How multiple factors control evapotranspiration in North America evergreen needleleaf forests. Sci. Total Environ. 2018, 622, 1217–1224. [Google Scholar] [CrossRef]
Beck, H.E.; Zimmermann, N.E.; McVicar, T.R.; Vergopolan, N.; Berg, A.; Wood, E.F. Present and future Köppen-Geiger climate classification maps at 1-km resolution. Sci. Data 2018, 5, 1–12. [Google Scholar] [CrossRef] [PubMed]
Allen, R.G.; Pereira, L.S.; Raes, D.; Smith, M. Crop evapotranspiration-Guidelines for computing crop water requirements-FAO Irrigation and drainage paper 56. Fao Rome 1998, 300, D05109. [Google Scholar]
Sugita, M.; Brutsaert, W. Daily evaporation over a region from lower boundary layer profiles measured with radiosondes. Water Resour. Res. 1991, 27, 747–752. [Google Scholar] [CrossRef]
Li, S.; Kang, S.; Li, F.; Zhang, L.; Zhang, B. Vineyard evaporative fraction based on eddy covariance in an arid desert region of Northwest China. Agric. Water Manag. 2008, 95, 937–948. [Google Scholar] [CrossRef]
Jönsson, P.; Eklundh, L. TIMESAT—A program for analyzing time-series of satellite sensor data. Comput. Geosci. 2004, 30, 833–845. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
da Silva Júnior, J.C.; Medeiros, V.; Garrozi, C.; Montenegro, A.; Gonçalves, G.E. Random forest techniques for spatial interpolation of evapotranspiration data from Brazilian’s Northeast. Comput. Electron. Agric. 2019, 166, 105017. [Google Scholar] [CrossRef]
Dou, X.; Yang, Y. Evapotranspiration estimation using four different machine learning approaches in different terrestrial ecosystems. Comput. Electron. Agric. 2018, 148, 95–106. [Google Scholar] [CrossRef]
Gocić, M.; Motamedi, S.; Shamshirband, S.; Petković, D.; Ch, S.; Hashim, R.; Arif, M. Soft computing approaches for forecasting reference evapotranspiration. Comput. Electron. Agric. 2015, 113, 164–173. [Google Scholar] [CrossRef]
Kişi, O.; Cimen, M. Evapotranspiration modelling using support vector machines/Modélisation de l’évapotranspiration à l’aide de ‘support vector machines’. Hydrol. Sci. J. 2009, 54, 918–928. [Google Scholar] [CrossRef]
Tang, D.; Feng, Y.; Gong, D.; Hao, W.; Cui, N. Evaluation of artificial intelligence models for actual crop evapotranspiration modeling in mulched and non-mulched maize croplands. Comput. Electron. Agric. 2018, 152, 375–384. [Google Scholar] [CrossRef]
Yao, Y.; Liang, S.; Li, X.; Chen, J.; Liu, S.; Jia, K.; Zhang, X.; Xiao, Z.; Fisher, J.B.; Mu, Q. Improving global terrestrial evapotranspiration estimation using support vector machine by integrating three process-based algorithms. Agric. For. Meteorol. 2017, 242, 55–74. [Google Scholar] [CrossRef]
Fan, J.; Yue, W.; Wu, L.; Zhang, F.; Cai, H.; Wang, X.; Lu, X.; Xiang, Y. Evaluation of SVM, ELM and four tree-based ensemble models for predicting daily reference evapotranspiration using limited meteorological data in different climates of China. Agric. For. Meteorol. 2018, 263, 225–241. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13 August 2016; pp. 785–794. [Google Scholar]
Wu, L.; Peng, Y.; Fan, J.; Wang, Y. Machine learning models for the estimation of monthly mean daily reference evapotranspiration based on cross-station and synthetic data. Hydrol. Res. 2019, 50, 1730–1750. [Google Scholar] [CrossRef]
Zhou, Z.; Zhao, L.; Lin, A.; Qin, W.; Lu, Y.; Li, J.; Zhong, Y.; He, L. Exploring the potential of deep factorization machine and various gradient boosting models in modeling daily reference evapotranspiration in China. Arabian J. Geosci. 2020, 13, 1–20. [Google Scholar] [CrossRef]
Maier, H.R.; Dandy, G.C. Neural networks for the prediction and forecasting of water resources variables: A review of modelling issues and applications. Environ. Modell. Softw. 2000, 15, 101–124. [Google Scholar] [CrossRef]
Han, X.; Wei, Z.; Zhang, B.; Li, Y.; Du, T.; Chen, H. Crop evapotranspiration prediction by considering dynamic change of crop coefficient and the precipitation effect in back-propagation neural network model. J. Hydrol. 2021, 596, 126104. [Google Scholar] [CrossRef]
Traore, S.; Luo, Y.; Fipps, G. Deployment of artificial neural network for short-term forecasting of evapotranspiration using public weather forecast restricted messages. Agric. Water Manag. 2016, 163, 363–379. [Google Scholar] [CrossRef]
Gill, E.J.; Singh, E.B.; Singh, E.S. Training back propagation neural networks with genetic algorithm for weather forecasting. In Proceedings of the IEEE 8th International Symposium on Intelligent Systems and Informatics, Subotica, Serbia, 10–11 September 2010; pp. 465–469. [Google Scholar]
R Core Team, R. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2021. Available online: https://www.R-project.org/ (accessed on 15 October 2023).
Aouissi, J.; Benabdallah, S.; Chabaâne, Z.L.; Cudennec, C. Evaluation of potential evapotranspiration assessment methods for hydrological modelling with SWAT—Application in data-scarce rural Tunisia. Agric. Water Manag. 2016, 174, 39–51. [Google Scholar] [CrossRef]
Fan, J.; Wu, L.; Zhang, F.; Cai, H.; Ma, X.; Bai, H. Evaluation and development of empirical models for estimating daily and monthly mean daily diffuse horizontal solar radiation for different climatic regions of China. Renew. Sustain. Energy Rev. 2019, 105, 168–186. [Google Scholar] [CrossRef]
Wu, Z.; Cui, N.; Hu, X.; Gong, D.; Wang, Y.; Feng, Y.; Jiang, S.; Lv, M.; Han, L.; Xing, L. Optimization of extreme learning machine model with biological heuristic algorithms to estimate daily reference crop evapotranspiration in different climatic regions of China. J. Hydrol. 2021, 603, 127028. [Google Scholar] [CrossRef]
Bellido-Jiménez, J.; Estévez, J.; García-Marín, A. Reference evapotranspiration projections in Southern Spain (until 2100) using temperature-based machine learning models. Comput. Electron. Agric. 2023, 214, 108327. [Google Scholar] [CrossRef]
Chia, M.Y.; Huang, Y.F.; Koo, C.H. Improving reference evapotranspiration estimation using novel inter-model ensemble approaches. Comput. Electron. Agric. 2021, 187, 106227. [Google Scholar] [CrossRef]
Guo, D.; Westra, S.; Maier, H.R. Sensitivity of potential evapotranspiration to changes in climate variables for different Australian climatic zones. Hydrol. Earth Syst. Sci. 2017, 21, 2107–2126. [Google Scholar] [CrossRef]
Esmaili, M.; Aliniaeifard, S.; Mashal, M.; Ghorbanzadeh, P.; Mehdi, S.; Gavilan, M.U.; Carrillo, F.F.; Lastochkina, O.; Tao, L. CO2 enrichment and increasing light intensity till a threshold level, enhance growth and water use efficiency of lettuce plants in controlled environment. Notulae Botanicae Horti Agrobot. 2020, 48, 2244–2262. [Google Scholar] [CrossRef]
Zhou, L.; Wang, Y.; Jia, Q.; Li, R.; Zhou, M.; Zhou, G. Evapotranspiration over a rainfed maize field in northeast China: How are relationships between the environment and terrestrial evapotranspiration mediated by leaf area? Agric. Water Manag. 2019, 221, 538–546. [Google Scholar] [CrossRef]
Suyker, A.E.; Verma, S.B. Interannual water vapor and energy exchange in an irrigated maize-based agroecosystem. Agric. For. Meteorol. 2008, 148, 417–427. [Google Scholar] [CrossRef]
Ilic, M.; Jovic, S.; Spalevic, P.; Vujicic, I. Water cycle estimation by neuro-fuzzy approach. Comput. Electron. Agric. 2017, 135, 1–3. [Google Scholar] [CrossRef]
Yang, J.; Wang, Y. Estimating evapotranspiration fraction by modeling two-dimensional space of NDVI/albedo and day–night land surface temperature difference: A comparative study. Adv. Water Resour. 2011, 34, 512–518. [Google Scholar] [CrossRef]
Liu, Q.; Wang, T.; Han, Q.; Sun, S.; Liu, C.-q.; Chen, X. Diagnosing environmental controls on actual evapotranspiration and evaporative fraction in a water-limited region from northwest China. J. Hydrol. 2019, 578, 124045. [Google Scholar] [CrossRef]
Huang, G.-B.; Zhu, Q.-Y.; Siew, C.-K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
Abyaneh, H.Z.; Nia, A.M.; Varkeshi, M.B.; Marofi, S.; Kisi, O. Performance evaluation of ANN and ANFIS models for estimating garlic crop evapotranspiration. J. Irrig. Drain. Eng. 2011, 137, 280–286. [Google Scholar] [CrossRef]
Dai, X.; Shi, H.; Li, Y.; Ouyang, Z.; Huo, Z. Artificial neural network models for estimating regional reference evapotranspiration based on climate factors. Hydrol. Process. An Int. J. 2009, 23, 442–450. [Google Scholar] [CrossRef]

Figure 1. Flowchart of machine learning models applied to evapotranspiration simulation.

Figure 2. Flowchart of the four machine learning algorithms.

Figure 3. Relationship between simulated evapotranspiration using four machine learning models with three input combinations and measurements at different represented sites located in the temperate–continental (TCCZ), subtropical–Mediterranean (SMCZ), and temperate (TCZ) climate zones.

Figure 4. Overall ET simulation accuracy across the 15 EC cropland sites. In the box plot, symbol x represents the mean, the circles represent the 25th and 75th percentile values, and the horizontal line in the center represents the median value.

Figure 5. Accuracy of ET simulation using three different combinations of input factors with the SVM model in three distinct climate zones.

Figure 6. Single-run time for training and simulation with all site data under various input factor combinations. The divisions of input factor combinations were conducted for correlation analysis on evapotranspiration utilizing two distinct datasets: all site data (represented by RF, SVM, XGB, and BP) and site data located in different climatic zones (represented by RF*, SVM*, XGB*, and BP*). For clarity, RF* is defined as the sum of RF_C, RF_S, and RF_T, with this naming convention uniformly applicable to the other models.

Table 1. Site characteristics used in this study.

Site	Latitude (°)	Longitude (°)	Altitude (m)	MAT (°C)	MAP (mm)	Country	Climate Zone	Period
US-ARM	36.6058	−97.4888	314	14.76	843	USA	TCCZ	2003–2012
US-CRT	41.6285	−83.3471	180	10.10	849	USA	TCCZ	2011–2013
US-Ne1	41.1651	−96.4766	361	10.07	790	USA	TCCZ	2001–2013
US-Ne2	41.1649	−96.4701	362	10.08	789	USA	TCCZ	2001–2013
US-Ne3	41.1797	−96.4397	363	10.11	784	USA	TCCZ	2001–2013
IT-BCi	40.5238	14.9574	20	18.00	600	Italy	SMCZ	2004–2014
IT-CA2	42.3772	12.0260	200	14.00	766	Italy	SMCZ	2011–2014
US-TW2	38.1047	−121.6433	−5	15.50	421	USA	SMCZ	2012–2013
US-TW3	38.1152	−121.6469	−9	15.60	421	USA	SMCZ	2013–2014
US-TW	38.1087	−121.6531	−6	15.60	421	USA	SMCZ	2009–2014
BE-Lon	50.5516	4.7461	167	10.00	800	Belgium	TCZ	2004–2014
CH-Oe2	47.2863	7.7343	452	9.80	1155	Switzerland	TCZ	2004–2014
DE-Geb	51.1001	10.9143	162	8.50	470	Germany	TCZ	2001–2014
DE-Kli	50.8931	13.5224	478	7.60	842	Germany	TCZ	2004–2014
FR-Gri	48.8442	1.9519	125	12.00	650	France	TCZ	2004–2014

Note: MAT: mean annual temperature, MAP: mean annual precipitation. Site codes and corresponding full names can be found in the Supplementary Materials.

Table 2. Statistical parameters of flux tower measured environmental variables and remote sensing data during the entire study period at the four sites.

Climate Zone	Variable	X_mean	X_max	X_min	X_sd	X_ku	X_sk
TCCZ	T_a	11.43	36.7	−19.94	11.24	−0.79	−0.32
	VPD	6.72	49.56	0	5.54	3.37	1.47
	P	97.71	101.65	94.77	0.92	1.97	0.99
	U	3.57	13.38	0.81	1.63	1.11	1.04
	R_n	96.83	270.59	−28.87	65.43	−1.22	0.15
	G	0.63	56.57	−54.03	13.57	0.64	0.39
	PPFD	429.64	1049.81	0.31	204.23	−0.59	0.29
	LAI	0.82	6.76	0	0.83	4.8	2.01
	EF	0.54	1	0.2	0.23	−1.13	0.33
	ET	1.64	7.04	0.01	1.41	0.67	1.23
SMCZ	T_a	16.19	29.8	0.08	5.95	−0.63	−0.23
	VPD	8.68	30.78	0.18	5.42	0.88	1.07
	P	101.29	103.29	97.89	1.01	1.31	−1.3
	U	2.98	9.57	0.56	1.8	0.09	0.9
	R_n	113.87	246.63	−20.33	58.95	−1.08	−0.12
	G	3.34	34.12	−19.63	8.04	0.92	0.75
	PPFD	517.56	970.1	31.58	207.45	−1.01	−0.08
	LAI	1.15	3.2	0.09	0.54	−0.05	0.18
	EF	0.66	1	0.2	0.23	−1.11	−0.33
	ET	2.32	7.65	0.02	1.5	0.73	1.04
TCZ	T_a	12.63	29.25	−13.73	6.79	0.00	−0.63
	VPD	5.49	26.51	0.00	3.64	1.17	0.95
	P	99.12	102.46	92.73	1.89	0.27	−1.17
	U	2.17	7.74	0.02	1.05	1.65	1.10
	R_n	86.37	235.35	−63.52	54.70	−0.82	−0.05
	G	4.65	62.27	−34.21	9.89	1.80	0.49
	PPFD	363.85	882.84	−35.16	164.91	−0.75	−0.01
	LAI	1.91	7.64	0.02	1.55	1.58	1.40
	EF	0.62	1.00	0.20	0.20	−0.93	−0.17
	ET	1.62	6.69	0.01	1.15	0.07	0.82

Note: Environmental variables included daily mean air temperature (Ta), net radiation (Rn), soil heat flux (G), evaporative fraction (EF), leaf area index (LAI), photosynthetic photon flux density (PPFD), vapor pressure deficit (VPD), wind speed (U), and atmospheric pressure (P). The statistical information covers X_mean (mean), X_max (maximum), X_min (minimum), X_sd (standard deviation), X_ku (kurtosis), and X_sk (skewness) for each environmental variable.

Table 3. Pearson correlation coefficients between input factors and evapotranspiration (ET).

Climate Zone	Items	R_n	PPFD	EF	T_a	LAI	VPD	G	U	P
All sites	r	0.788	0.648	0.591	0.59	0.513	0.394	0.316	−0.09	0.035
All sites	rank	1	2	3	4	5	6	7	8	9
SMCZ	r	0.692	0.601	0.524	0.437	0.513	0.252	0.204	0.461	−0.068
SMCZ	rank	1	2	3	6	4	7	8	5	9
TCZ	r	0.807	0.716	0.506	0.543	0.465	0.487	0.418	−0.069	−0.032
TCZ	rank	1	2	4	3	6	5	7	8	9
TCCZ	r	0.797	0.628	0.638	0.624	0.798	0.375	0.303	−0.215	−0.077
TCCZ	rank	2	4	3	5	1	6	7	8	9

Table 4. Three-factor combinations were obtained from the correlation analysis of ET using data from all sites, along with their combination with the RF, SVM, XGB, and BP models.

Input Combination				Input Data
RF	SVM	XGB	BP	Input Data
RF-V3	SVM-V3	XGB-V3	BP-V3	R_n, PPFD, EF
RF-V6	SVM-V6	XGB-V6	BP-V6	R_n, PPFD, EF, T_a, LAI, VPD
RF-V9	SVM-V9	XGB-V9	BP-V9	R_n, PPFD, EF, T_a, LAI, VPD, G, U, P

Table 5. The GPI values of machine learning models for ET simulations at different stations using different input combinations.

Site	RF-V3	RF-V6	RF-V9	SVM-V3	SVM-V6	SVM-V9	XGB-V3	XGB-V6	XGB-V9	BP-V3	BP-V6	BP-V9
US-ARM	−3.320	0.046	0.165	−3.364	−0.144	0.187	−3.579	0.021	0.240	−3.401	−0.021	0.421
US-CRT	−1.349	0.017	0.053	−1.753	0.943	1.052	−1.618	0.369	−0.065	−2.943	0.924	0.166
US-Ne1	−3.504	0.003	0.155	−3.522	0.013	0.047	−3.594	−0.030	0.067	−3.392	0.020	0.372
US-Ne2	−3.071	−0.052	0.095	−3.048	0.134	0.598	−2.938	−0.116	0.052	−3.252	0.367	0.748
US-Ne3	−3.453	−0.022	0.125	−3.283	0.198	0.364	−3.395	−0.105	0.022	−3.070	0.252	0.547
IT-BCi	−0.742	0.320	0.355	−2.120	0.309	0.225	−1.868	0.593	0.539	−3.407	−0.353	−0.703
IT-CA2	−1.547	0.008	−0.041	−1.333	1.076	0.772	−2.360	−0.131	0.112	−2.885	0.931	0.468
US-TW2	−0.537	0.454	0.670	−1.321	1.616	1.491	−1.082	−1.184	−0.886	−2.038	1.831	1.765
US-TW3	−3.083	0.026	0.216	−1.724	0.196	0.125	−2.169	0.414	0.707	−2.029	−1.839	0.800
US-TW	−3.232	−0.034	0.163	−2.670	0.408	0.483	−2.939	−0.019	0.008	−2.798	0.723	0.283
BE-Lon	−1.160	0.215	0.278	−1.150	0.234	0.346	−1.259	−0.251	−0.004	−3.654	0.004	0.166
CH-Oe2	−1.770	0.576	−0.551	−2.636	1.318	1.088	−2.514	0.839	0.972	−1.770	0.576	−0.551
DE-Geb	−3.390	0.021	0.076	−3.117	0.189	0.257	−3.704	−0.112	−0.021	−2.549	0.276	0.273
DE-Kli	−2.325	−0.004	−0.011	−1.793	1.148	0.912	−2.852	0.064	−0.110	−2.290	0.923	0.615
FR-Gri	−3.162	0.269	0.209	−2.892	0.584	0.473	−3.414	−0.190	−0.168	−2.878	0.276	0.576
Mean	−2.376	0.123	0.130	−2.382	0.548	0.561	−2.619	0.011	0.098	−2.824	0.326	0.396

Note: The best model is in bold.

Table 6. Three combinations were obtained from the correlation analysis of ET using data from all sites, along with their combination with the model for simulating ET.

Models	Evaluating Indicators	Unit	Input Combination
Models	Evaluating Indicators	Unit	V3	V6	V9	Mean
RF	RMSE	mm d⁻¹	0.738	0.427	0.415	0.527
	MAE	mm d⁻¹	0.547	0.313	0.304	0.388
	R²	-	0.618	0.844	0.850	0.771
	NSE	-	0.378	0.756	0.765	0.633
SVM	RMSE	mm d⁻¹	0.774	0.347	0.344	0.488
	MAE	mm d⁻¹	0.573	0.249	0.247	0.356
	R²	-	0.622	0.894	0.896	0.804
	NSE	-	0.256	0.863	0.858	0.659
XGB	RMSE	mm d⁻¹	0.780	0.429	0.425	0.545
	MAE	mm d⁻¹	0.576	0.313	0.300	0.396
	R²	-	0.597	0.827	0.824	0.749
	NSE	-	0.272	0.748	0.743	0.588
BP	RMSE	mm d⁻¹	0.892	0.370	0.390	0.551
	MAE	mm d⁻¹	0.627	0.267	0.275	0.390
	R²	-	0.570	0.873	0.860	0.768
	NSE	-	−0.161	0.842	0.793	0.491

Note: Input combinations V3, V6, V9, and mean represent three, six, and nine input factor combinations and average of all combinations, respectively.

Table 7. Three combinations obtained from the correlation analysis of ET using data from sites within each climate zone, along with their combination with RF, SVM, XGB, and BP models.

Climate Zone	Input Combination				Input Data
Climate Zone	RF	SVM	XGB	BP	Input Data
SMCZ	RF_S-V3	SVM_S-V3	XGB_S-V3	BP_S-V3	R_n, PPFD, EF
	RF_S-V6	SVM_S-V6	XGB_S-V6	BP_S-V6	R_n, PPFD, EF, LAI, U, T_a
	RF_S-V9	SVM_T-V9	XGB_S-V9	BP_S-V9	R_n, PPFD, EF, LAI, U, T_a, VPD, G, P
TCZ	RF_T-V3	SVM_T-V3	XGB_T-V3	BP_T-V3	R_n, PPFD, T_a
	RF_T-V6	SVM_T-V6	XGB_T-V6	BP_T-V6	R_n, PPFD, T_a, EF, VPD, LAI
	RF_T-V9	SVM_T-V9	XGB_T-V9	BP_T-V9	R_n, PPFD, T_a, EF, VPD, LAI, G, U, P
TCCZ	RF_C-V3	SVM_C-V3	XGB_C-V3	BP_C-V3	LAI, R_n, EF
	RF_C-V6	SVM_C-V6	XGB_C-V6	BP_C-V6	LAI, R_n, EF, PPFD, T_a, VPD
	RF_C-V9	SVM_C-V9	XGB_C-V9	BP_C-V9	LAI, R_n, EF, PPFD, T_a, VPD, G, U, P

Table 8. Statistical parameters between simulated evapotranspiration by machine learning models using different input combinations and actual measurements in different climate zones. (The subscript (C, S, T) in each model indicates the temperate–continental, subtropical–Mediterranean, and temperate climate zones, respectively.)

Models	Temperate–Continental				Models	Subtropical–Mediterranean				Models	Temperate
Models	RMSE	MAE	R²	NSE	Models	RMSE	MAE	R²	NSE	Models	RMSE	MAE	R²	NSE
RF_C-V3	0.398	0.287	0.887	0.778	RF_S-V3	0.383	0.294	0.833	0.776	RF_T-V3	0.713	0.508	0.655	0.546
RF_C-V6	0.404	0.288	0.888	0.767	RF_S-V6	0.443	0.338	0.786	0.682	RF_T-V6	0.410	0.291	0.865	0.840
RF_C-V9	0.391	0.278	0.893	0.762	RF_S-V9	0.463	0.357	0.775	0.664	RF_T-V9	0.411	0.291	0.867	0.839
mean	0.398	0.284	0.889	0.769	mean	0.430	0.330	0.798	0.707	mean	0.511	0.363	0.796	0.742
SVM_C-V3	0.325	0.229	0.916	0.893	SVM_S-V3	0.385	0.276	0.820	0.784	SVM_T-V3	0.703	0.493	0.669	0.557
SVM_C-V6	0.323	0.225	0.921	0.903	SVM_S-V6	0.373	0.263	0.819	0.778	SVM_T-V6	0.333	0.248	0.917	0.899
SVM_C-V9	0.288	0.200	0.941	0.920	SVM_S-V9	0.403	0.287	0.834	0.761	SVM_T-V9	0.342	0.254	0.912	0.893
mean	0.312	0.218	0.926	0.905	mean	0.387	0.275	0.824	0.774	mean	0.460	0.332	0.833	0.783
XGB_C-V3	0.376	0.269	0.891	0.848	XGB_S-V3	0.413	0.314	0.785	0.746	XGB_T-V3	0.732	0.520	0.624	0.526
XGB_C-V6	0.371	0.263	0.892	0.849	XGB_S-V6	0.476	0.346	0.744	0.622	XGB_T-V6	0.416	0.303	0.866	0.840
XGB_C-V9	0.361	0.250	0.899	0.840	XGB_S-V9	0.475	0.338	0.721	0.623	XGB_T-V9	0.404	0.289	0.873	0.848
mean	0.369	0.261	0.894	0.846	mean	0.455	0.333	0.750	0.664	mean	0.517	0.371	0.787	0.738
BP_C-V3	0.883	0.425	0.782	0.691	BP_S-V3	0.462	0.318	0.722	0.628	BP_T-V3	0.835	0.495	0.586	0.290
BP_C-V6	0.337	0.234	0.914	0.866	BP_S-V6	0.542	0.404	0.675	0.513	BP_T-V6	0.382	0.278	0.893	0.867
BP_C-V9	0.259	0.177	0.949	0.926	BP_S-V9	0.445	0.338	0.835	0.731	BP_T-V9	0.417	0.278	0.863	0.826
mean	0.493	0.279	0.882	0.828	mean	0.483	0.353	0.744	0.624	mean	0.544	0.350	0.781	0.661

Table 9. The GPI values of the four machine learning models for evapotranspiration simulation using different input combinations across different climate zones.

Climate Zone	Site	RF-V3	RF-V6	RF-V9	SVM-V3	SVM-V6	SVM-V9	XGB-V3	XGB-V6	XGB-V9	BP-V3	BP-V6	BP-V9
TCCZ	US-ARM	−0.386	−0.319	0.675	−1.112	−1.495	0.746	−0.207	−0.471	1.135	0.193	0.261	2.505
	US-CRT	−0.226	−0.268	−0.301	0.312	0.393	0.466	0.003	−0.003	−0.038	−3.508	0.178	0.466
	US-Ne1	−0.744	−0.088	0.832	0.287	−0.115	0.053	−1.398	−0.319	0.339	0.820	−0.739	2.602
	US-Ne2	−0.320	−0.763	−0.087	0.105	0.091	2.124	−1.348	−0.924	−0.189	0.554	0.859	2.652
	US-Ne3	−1.373	−0.982	−0.074	0.455	0.334	1.431	−1.839	−1.676	−0.875	0.167	0.566	2.158
	Mean	−0.610	−0.484	0.209	0.009	−0.158	0.964	−0.958	−0.679	0.074	−0.355	0.225	2.077
SMCZ	IT-BCi	0.170	0.206	−0.149	0.057	0.070	−0.232	−0.374	0.034	0.145	−0.019	−3.778	−1.361
	IT-CA2	0.228	−0.755	−0.898	0.905	0.840	0.981	0.220	−0.455	−0.071	−2.127	−2.151	1.342
	US-TW2	0.467	−1.006	−1.144	0.345	0.361	0.383	−0.208	−2.346	−2.879	−0.361	1.092	0.087
	US-TW3	−0.128	−0.624	−0.666	0.581	−0.106	−0.375	1.830	0.539	0.818	−2.054	−1.265	0.336
	US-TW	−0.332	−0.872	−0.670	0.003	2.447	0.649	−0.075	−0.665	−1.308	0.132	0.476	0.963
	Mean	0.081	−0.610	−0.705	0.378	0.722	0.281	0.279	−0.579	−0.659	−0.886	−1.125	0.274
TCZ	BE-Lon	−1.160	0.215	0.278	−1.150	0.234	0.346	−1.259	−0.251	−0.004	−3.654	0.004	0.166
	CH-Oe2	−2.539	0.039	−0.037	−2.671	1.284	1.054	−2.548	0.805	0.937	−1.804	0.542	−0.585
	DE-Geb	−3.390	0.021	0.076	−3.117	0.189	0.257	−3.704	−0.112	−0.021	−2.549	0.276	0.273
	DE-Kli	−2.325	−0.004	−0.011	−1.793	1.148	0.912	−2.852	0.064	−0.110	−2.290	0.923	0.615
	FR-Gri	−3.162	0.269	0.209	−2.892	0.584	0.473	−3.414	−0.190	−0.168	−2.878	0.276	0.576
	Mean	−2.515	0.108	0.103	−2.325	0.688	0.608	−2.756	0.063	0.127	−2.635	0.404	0.209
All sites	Mean	−1.262	−0.310	−0.261	−0.952	0.665	0.550	−1.276	−0.323	−0.263	−1.750	−0.038	0.553

Note: The best model is in bold.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Du, C.; Jiang, S.; Chen, C.; Guo, Q.; He, Q.; Zhan, C. Machine Learning-Based Estimation of Daily Cropland Evapotranspiration in Diverse Climate Zones. Remote Sens. 2024, 16, 730. https://doi.org/10.3390/rs16050730

AMA Style

Du C, Jiang S, Chen C, Guo Q, He Q, Zhan C. Machine Learning-Based Estimation of Daily Cropland Evapotranspiration in Diverse Climate Zones. Remote Sensing. 2024; 16(5):730. https://doi.org/10.3390/rs16050730

Chicago/Turabian Style

Du, Changmin, Shouzheng Jiang, Chuqiang Chen, Qianyue Guo, Qingyan He, and Cun Zhan. 2024. "Machine Learning-Based Estimation of Daily Cropland Evapotranspiration in Diverse Climate Zones" Remote Sensing 16, no. 5: 730. https://doi.org/10.3390/rs16050730

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning-Based Estimation of Daily Cropland Evapotranspiration in Diverse Climate Zones

Abstract

1. Introduction

2. Materials and Methods

2.1. Description of the Flux Sites

2.2. Flux and Auxiliary

2.3. Machine Learning Models

2.3.1. Random Forest (RF)

2.3.2. Support Vector Machine (SVM)

2.3.3. Extreme Gradient Boosting (XGB)

2.3.4. Backpropagation Neural Network (BP)

2.3.5. Model Development

2.4. Evaluating Indicators

3. Results

3.1. The Overall Performance of Four Machine Learning Models in Simulating ET

3.2. Performance of Four Machine Learning Models in Simulating ET in Different Climatic Regions

3.3. Runtime Analysis of Four Machine Learning Models in ET Simulation

4. Discussion

4.1. Importance of Input Factors for Simulating Evapotranspiration

4.2. Optimal Input Factor Combinations and Machine Learning Models for Simulating Evapotranspiration

4.3. Uncertainties

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI