Air Temperature Monitoring over Low Latitude Rice Planting Areas: Combining Remote Sensing, Model Assimilation, and Machine Learning Techniques

Lin, Minghao; Fang, Qiang; Xia, Jizhe; Xu, Chenyang

doi:10.3390/rs15153805

Open AccessArticle

Air Temperature Monitoring over Low Latitude Rice Planting Areas: Combining Remote Sensing, Model Assimilation, and Machine Learning Techniques

¹

School of Agriculture, Sun Yat-sen University, Shenzhen 518107, China

²

Shenzhen Campus, Sun Yat-sen University, Shenzhen 518107, China

³

Department of Urban Informatics, Shenzhen University, Shenzhen 518060, China

⁴

Modern Agricultural Innovation Center, Henan Institute of Sun Yat-sen University, Zhumadian 463000, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2023, 15(15), 3805; https://doi.org/10.3390/rs15153805

Submission received: 31 May 2023 / Revised: 15 July 2023 / Accepted: 27 July 2023 / Published: 31 July 2023

(This article belongs to the Special Issue Ecological Environment Satellite System: Research and Application)

Download

Browse Figures

Versions Notes

Abstract

:

Air temperature (Ta) is essential for studying surface processes and human activities, particularly agricultural cultivation, which is strongly influenced by temperature. Remote sensing techniques that integrate multi-source data can estimate Ta with a high degree of accuracy, overcoming the shortcomings of traditional measurements due to spatial heterogeneity. Based on in situ measurements in Guangdong Province from 2012 to 2018, this study applied three machine learning (ML) models and fused multi-source datasets to evaluate the performance of four data combinations in Ta estimation. Correlations of covariates were compared, focusing on rice planting areas (RA). The results showed that (1) The fusion of multi-source data improved the accuracy of model estimations, where the best performance was achieved by the random forest (RF) model combined with the ERA5 combination, with the highest R² reaching 0.956, the MAE value of 0.996 °C, and the RMSE of 1.365 °C; (2) total precipitation (TP), wind speed (WD), normalized difference vegetation index (NDVI), and land surface temperature (LST) were significant covariates for long-term Ta estimations; (3) Rice planting improved the model performance in estimating Ta, and model accuracy decreased during the crop rotation in summer. This study provides a reference for the selection of temperature estimation models and covariate datasets. It offers a case for subsequent ML studies on remote sensing of temperatures over agricultural areas and the impact of agricultural cultivation on global warming.

Keywords:

remote sensing; air temperature; machine learning; rice planting

1. Introduction

In the context of global warming [1], timely and accurate monitoring of Ta is increasingly important for studying surface energy, the water balance of the land-atmosphere system [2], and human production practices, such as drought monitoring, urban heat island effects, and hydrological assessments [3,4,5]. In particular, the development of crops is inevitably challenged by temperature variability, which significantly affects agricultural yields and crop quality [6,7,8,9]. Therefore, it is essential to use long-term spatially and temporally continuous Ta data to understand the Ta variability in agricultural plating areas. However, the temporal periodicity and spatially extensiveness of agricultural cultivation and production have resulted in spatiotemporal heterogeneity in the Ta of agricultural planting areas [10]. Traditional interpolation methods using Ta data from meteorological stations may sometimes fail to accurately reflect the spatial heterogeneity and non-linear relationships of the data. Other factors such as site density, land cover, and elevation (ELE) also greatly influence the accuracy of spatial interpolation, which has hindered accurate agricultural Ta monitoring [11,12,13,14].

The latest remote sensing techniques have integrated multiple sources of remote sensing data based on station observations, resulting in high accuracy in the spatial and temporal estimation of Ta [15,16,17,18,19,20]. For example, Noi et al. combined multiple LST datasets with auxiliary datasets to achieve a highly accurate Ta estimation [16]. Qin et al. successfully estimated long-term near-surface Ta by integrating station, satellite, and reanalysis data [17]. In another study, multiple satellite measurements and ELE data were used to generate daily mean surface Ta over mainland China [15]. Zheng et al. combined remotely sensed data, assimilation data, and geochronological parameters to reconstruct all-weather maximum Ta over long periods for Eurasia [20]. Combining multi-source data, Zeng et al. estimated three spatial scale types of Ta (climate zone, continental, and global scales) [19]. These studies showed that highly correlated multiple-variable fusions could improve temperature estimation accuracy.

Many studies have evaluated meteorological variables for modeling suitability in remote sensing, such as soil moisture [21], precipitation [22], evapotranspiration (ET) [23], and temperature [24,25]. However, most studies have only assessed individual meteorological variables provided by multiple satellite products and reanalysis datasets. The applicability of the various data products and reanalysis datasets differs in terms of algorithms, parameters, data sources, etc., so their utility in different regions could vary [26,27]. Therefore, evaluating more data variables would be conducive to the selection of data sources and environmental parameters. At the same time, few studies have applied the same variables from multiple sources and assessed their different impacts on model performance.

In recent years, the introduction of ML and deep learning (DL) models has become a popular approach to modeling in remote sensing. It can help to capture complex relationships among multiple covariates and improve model efficiency while providing superior goodness-of-fit and accuracy over traditional semi-empirical models [28]. The RF model is a classic ML model for temperature estimation [29,30,31,32,33], as it reduces the risk of overfitting by combining multiple weak classifiers to form a robust classifier [34]. For instance, Liu et al. used the RF model to estimate near-surface Ta in an arid region of northwest China with satellite measurements [35]. As a recurrent neural network (RNN), the long short-term memory (LSTM) neural network model has exhibited exemplary performance in LST estimation [36,37,38,39,40]. Chung et al. constructed a Ta estimation LSTM model using different LST data during cold and hot periods in one year and achieved acceptable performance during cold periods [41]. Yang et al. also evaluated the performance of large-scale LSTM models for Ta [42]. The potential of the artificial neural network (ANN) model for temperature estimation has also been demonstrated [39,43,44,45,46]. Runke et al. applied the ANN model to achieve high-temperature estimation accuracy in complex mountainous areas [47]. Şahin utilized the ANN model to model monthly average temperatures for 20 cities [48]. However, ML models, particularly DL models, have not been frequently applied to estimate Ta in agricultural planting areas.

Rice (Oryza sativa L.) is an internationally vital staple food crop widely cultivated in Asia. Low-latitude areas, particularly those used for double-season rice planting in the major rice-planting regions in southern China, are strongly affected by extreme temperatures [10]. Although a previous study examined the effects of low-latitude rice planting on regulating the urban cluster heat island effect [49], the area studied was too small to be representative enough. Therefore, there is a need to analyze how low-latitude rice planting regulates temperature at a higher spatial and temporal scale. In addition, different land cover types directly or indirectly alter surface temperature [50,51]. In agricultural remote sensing, Xin et al. investigated the effect of rice planting on surface temperature in saline areas of northeastern China by integrating several remote sensing images [52]. Chen et al. applied multi-temporal thermal infrared images to describe the effects of interannual crop variabilities on regional surface temperature [51]. Liu et al. studied the response of rice field expansion to temperature in the Sanjiang Plain, and the results showed large seasonal variability [53]. Zhou et al. derived the cooling effect of agriculture on surface temperature in eastern China [54]. However, most of the existing studies have focused on assessing the effects of crop cultivation on temperature. Few studies have yet estimated the response of accuracy to crop cultivation from a remotely sensed inverse perspective for temperature models. Understanding the differences in model performance for temperature estimation at different stages for a specific crop type (e.g., rice) is also essential for agricultural remote sensing applications.

Temperature estimation, especially in agricultural planting areas, is moderated by various factors such as ELE, LST, ET, TP, etc. Multiple remote sensing datasets provide the same temperature-related meteorological variables, such as TP, WD, etc. However, few studies have evaluated the effects of the same variables from multiple sources on temperature estimation. Guangdong Province is the main planting area for double-season rice at low latitudes in China and experiences a high incidence of extreme temperatures. It is of significance to investigate the impact of low-latitude rice cultivation on the estimation of agricultural remote sensing Ta models. Therefore, the objectives of this study are to (1) evaluate the impacts of fusing the same variables from multiple datasets on temperature estimation models; (2) apply and compare ML and DL models to temperature estimation in RA; and (3) discuss the effects of rice planting on Ta model estimation at low latitudes and the cooling effect of rice planting. The results of this study are expected to provide a case for selecting the most suitable multi-source dataset for temperature estimation and provide a scientific basis for studying the application of ML to Ta remote sensing in agricultural areas.

2. Materials and Methods

2.1. Study Area

The total area of Guangdong Province covers approximately one hundred and eighty thousand km² (Figure 1). It lies between latitudes 20°09′N and 25°34′N and longitudes 109°45′E and 117°20′E. Guangdong Province has diverse land cover types with substantial spatial heterogeneity. Mountainous hills, tablelands, and plains account for 33.7%, 24.9%, 14.2%, and 21.7% of the province’s total land area, respectively, while rivers and lakes account for only 5.5%. Generally speaking, the topography of Guangdong is high in the north, with mountains and hills, and low in the south, with plains and tablelands. There are 1745.8 average annual sunshine hours in Guangdong Province; the annual temperature is 22.3 °C; and the total annual precipitation is as high as 319.4 billion m³. The type of rice planted in Guangdong Province is double-season rice, and the area with rice planting is about 18,000 km², accounting for approximately 10% of the area of Guangdong Province.

2.2. Datasets

2.2.1. In Situ Observation Data

The average Ta observations from 86 meteorological stations distributed within Guangdong Province were extracted and measured at 1.5 m above the ground. Ta data were obtained from the China Meteorological Data Service Center (CMDC, http://data.cma.cn/, accessed on 22 February 2022). Fifty-eight stations in non-rice planting areas (NA) and twenty-eight stations in RA were extracted based on the rice planting ranges in Guangdong Province. In addition, each station’s geographical and temporal parameters, including latitude, longitude, and time, were used and acquired simultaneously with the Ta observations.

2.2.2. DEM Data

We applied the 30 m resolution Digital Elevation Model (DEM) data (version SRTMGL1v003G) from Shuttle Radar Topography Mission (SRTM) data to obtain ELE data for Guangdong Province.

2.2.3. Satellite Observation Data

In this study, two land products (from 2012 to 2018) of moderate resolution imaging spectroradiometer (MODIS) were downloaded: the MOD11A1 and MOD09GA data. Previous studies demonstrated that LST was an influential variable for estimating Ta [55]. Hence, 1 km daily LST data were obtained from the MOD11A1 product. Daily surface reflectance for bands 1 and 2 from MOD09GA was utilized to calculate NDVI.

N D V I

is one of the most critical parameters from satellites that reflects crop growth and nutrient information. NDVI values were calculated using the MOD09GA product 1 km daily near-infrared band

(B 2)

and red band

(B 1)

as follows:

N D V I = \frac{N I R (B 2) - R E D (B 1)}{N I R (B 2) + R E D (B 1)}

(1)

2.2.4. Data Products and Reanalysis Datasets

Global land surface satellite (GLASS) product data: This study used ET, black-sky albedo in the Near Infrared band (NIR), and net primary production (NPP) of GLASS products from 2012 to 2018 with a temporal resolution of 8 days (http://glass.umd.edu/index.html/, accessed on 27 October 2022). Compared with the 1 km resolution of ET and NIR products, NPP products have a 500 m resolution. The eddy covariance-light use efficiency (EC-LUE) model based on remote sensing data calculated the GLASS NPP product. The GLASS ET product was obtained by merging five process-based algorithms with a Bayesian model averaging method, which showed high reliability and accuracy. The GLASS albedo products include three spectral ranges: total shortwave, visible, and near-IR under actual atmospheric conditions. NIR was used in this study.

We used three datasets from the fifth-generation climate reanalysis dataset of the European Centre for Medium range Weather Forecasts (ECMWF) Reanalysis v5 (ERA5), global land data assimilation system (GLDAS), and China meteorological forcing dataset (CMFD), which included surface pressure (SP), WD, and TP.

ERA5 data: We obtained ERA5-land product ‘surface pressure’ data, ‘total precipitation’ data, and ‘10 m u/v component of wind’ data distributed by ECMWF archive (https://cds.climate.copernicus.eu/cdsapp#!/home, accessed on 10 December 2022). The spatial resolution of the reanalysis dataset is 0.1 × 0.1°. The ‘Surface pressure’ data reflects the pressure of the atmosphere on the land surface. The ‘10 m u/v component of wind’ data means the eastward and northward components of 10 m winds affected by local vegetation, buildings, etc. The ‘Total precipitation’ data includes the sum of rainfall and snow.

GLDAS data: With the advanced generation of ground and space-based observation systems, GLDAS provides a series of long-term gridded land surface states and flux parameters. We used the Noah model data product in this study. Assimilation data such as ‘Psurf_f_inst’, ‘Rainf_f_tavg’, and ‘Wind_f_inst’ with 0.25° spatial and 3-hourly resolution were utilized.

CMFD data: The Qinghai-Tibet Plateau Institute of the Chinese Academy of Sciences developed CMFD data established by interpolating remote sensing products, reanalysis data sets, and in situ observation data (https://data.tpdc.ac.cn/home, accessed on 13 December 2022). The dataset contains sea, land, and air meteorological elements with temporal and spatial resolutions of 3 h and 0.1°, respectively. Several studies have shown that CMFD is more accurate than other reanalysis datasets in China. This study used ‘pressure’, ‘wind speed’, and ‘precipitation rate’ from the CMFD dataset.

3. Methodology

Figure 2 shows the general framework of the study. Firstly, based on in situ site observation data, we fused multiple source datasets (DEM, MODIS, GLASS, and reanalysis datasets) and classified the data into four combinations (GLASS, CMFD, GLDAS, and ERA5) with three different ranges (All areas of Guangdong Province (AA), RA, and NA). We then used three ML methods (RF, ANN, and LSTM) to compare multiple linear regression (MLR) model outputs of long-time series Ta and evaluate the models. The study also investigated the effects of rice planting on Ta estimation and identified factors that influence Ta.

We introduced daily station Ta observation data with parameters for geography and time and remotely sensed data to conduct our work. All data and their abbreviations used in this study are listed in Table 1.

3.1. Data Preprocessing

The Hdf-Eos to GIS conversion tool (HEG) v2.15 was applied to cropping the raw data, and the cropped data were resampled to the same resolution using the nearest neighbor algorithm, with a scaled pixel size of 0.0125° (x, y) and a daily time scale. There are a few things to note. (1) ERA5 data were assimilated from m/hour to mm/hour. (2) The WD of ERA5 was calculated from the ‘10 m u/v component of the wind’ data.

After all data preprocessing was completed, the data were divided into AA, RA, and NA according to the rice planting scope and site distribution in Guangdong Province. According to the difference in assimilation data, data were divided into four data combinations: GLASS combination (DEM data + MODIS datasets + GLASS datasets), ERA5 combination (DEM data + MODIS datasets + GLASS datasets + ERA5 reanalysis datasets), CMFD combination (DEM data + MODIS datasets + GLASS datasets + CMFD reanalysis datasets), and GLDAS combination (DEM data + MODIS datasets + GLASS datasets + GLDAS reanalysis datasets).

3.2. Modeling

3.2.1. RF Model

We used the RF model based on a decision tree algorithm to estimate the outcome by voting from multiple decision trees. Each decision tree is trained with a subset of the input data, and node splitting is determined using randomly selected attributes [56]. However, the number of decision trees in the model also limits the space and time required for model training [57]. In addition, the RF model provides the relative importance of the predictors for the response variable. Model performance depends on the optimal choice of the hyperparameters, such as the number of trees ‘Num_tree’ and the minimum number of samples required at a leaf node ‘min_samples_leaf’ [43]. A reasonable number of trees provides better training efficiency while ensuring the predictive efficiency of the model. The tuning of the ‘min_samples_leaf’ parameter confirms that the model was robust. We tuned these parameters several times and finally settled on an optimal ‘min_samples_leaf’ of 5 and an optimal ‘Num_tree’ of 150.

3.2.2. ANN Model

We applied the ANN model, a non-parametric and non-linear model mimicking the human brain’s information processing. The ANN model consists of an input layer, an output layer, and several hidden layers, as shown in the following equation:

Y = f (X, W) + \in

(2)

where

Y

denotes the response-dependent variable of the model,

X

denotes the predicted independent variable,

W

denotes its weight, and

\in

denotes its deviation term.

We used the Levenberg-Marquardt algorithm as the ANN function. The hyperparameter of the ANN model is ‘hidden layer size’, which specifies the number and size of the hidden layer in the neural network and affects model accuracy. The final ANN model had three hidden layers, each with 10 hidden neural units.

3.2.3. LSTM Model

LSTM was used in this study. It is an RNN type that can learn long-term dependencies and estimate time series data. The LSTM model overcomes the vanishing gradient problem of the traditional RNN model and has higher accuracy than the convolutional neural network (CNN) model. The LSTM model has a chain-like network structure composed of recurrent critical block units connected in both directions. The three gates of the control unit state in the LSTM are known as forgetting (

f_{t}

), input (

i_{t}

) and output (

o_{t}

). Each gate consists of an S-shaped function and a fully connected neural network activated by pointwise multiplication, which can control the storage and deletion of information. The single unit takes the sequence data

x_{t}

(t = 1, 2, 3, 4, …) as the input set.

h_{t - 1}

represents the input information.

h_{t}

represents the output information.

C_{t - 1}

denotes the state of the old element.

C_{t}

is a new cell state.

This study used the adaptive moment estimation (Adam) optimizer to update the model parameters. The number of iterations for each model was set to 400. After tuning and selection, the hyperparameters ‘minibatch size’, ‘numHiddenUnits’, and ‘InitialLearnRate’ of the LSTM model were adjusted to 512, 16, and 0.002, respectively.

3.2.4. MLR Model

MLR is a classical linear regression technique. In contrast to simple linear regression analysis, MLR allows for predicting the best relationship between a variable and several independent variables. In this study, the MLR model was developed using Ta data and several covariates.

3.3. Model Training and Validation

This paper used RF, ANN, and LSTM models to model the processed data, and MLR was introduced as a control. All predictor variables were normalized. A total of 75% of the dataset was selected as the training set, and the remaining 25% was used as the test set. Model performance was tested by 10-fold cross-validation. The model’s accuracy was evaluated by comparing the predicted results with the valid corresponding station-based observations. The root mean square error (

R M S E

), the mean absolute error (

M A E

), and the coefficient of determination (

R^{2}

) were used as model evaluation indicators. The model indicator formula is shown in the following equations:

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - \hat{y})}^{2}}

(3)

M A E = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - \hat{y}|

(4)

R^{2} = \frac{\sum_{i = 1}^{n} {(y_{i} - \hat{y})}^{2}}{{(y_{i} - \hat{y})}^{2}}

(5)

where

y_{i}

and

{\hat{y}}_{i}

, are the

i

-th actual value and the

i

-th predicted value of the n-th data.

4. Result

4.1. Model Performance Evaluation

This study applied three ML models (RF, ANN, and LSTM) and a conventional MLR model to estimate Ta in Guangdong Province from 2012 to 2018. Applying ML and DL methods increased Ta estimation accuracy over a long time series. The Taylor diagram (Figure 3) showed that the RF model had the best performance with a maximum R² of 0.951 and a normalized RMSE of less than 0.222 because there was randomly generated structural representation during the RF model’s training, making it more adaptable and robust than other fixed models [42]. The ANN model emerged as the second-best performer, which exhibited a maximum improvement of 0.047 in R² and a maximum decrease of 0.349 and 0.405 in MAE and RMSE, respectively, compared with the traditional MLR model. The LSTM model demonstrated the poorest performance, with its estimation accuracy results closely resembling those of the MLR model.

Integrating more relevant variables can efficiently improve model estimation for complex temperatures. Table 2 showed that the CMFD, GLDAS, and ERA5 combination with more variables had higher model accuracy than the GLASS combination. Comparing the models, the combination of the fusing reanalysis dataset ERA5-land had the best performance at the same spatial accuracy as the CMFD combination, which can be attributed to the lower temporal resolution (1 h) of ERA5 [58].

4.2. Relative Importance of Covariates

The RF model was used to compare the relative importance of individual covariates in each combination (Figure 4). NDVI was the most significant covariate in all combinations except the ERA5 combination, with 25%, 17%, and 19% of the overall importance in the GLASS, CMFD, and GLDAS combinations, respectively. On the other hand, LST, another covariate provided by MODIS, generally exhibited less importance in the model compared with NDVI. When considering the covariates provided by the GLASS data products, their relative importance followed the order of NIR > ET > NPP. In terms of comparing the same variables across different reanalysis data combinations, TP showed the highest relative importance. In the ERA5 combination, TP surpassed the other combinations with a relative importance exceeding 22%, far exceeding the CMFD combination (15%) and the GLDAS combination (13%).

4.3. Effects of Rice Planting on Modeling and Temperature Variation

We evaluated the estimation accuracy in RA and NA using the RF model with the best overall performance. Figure 5 showed that the scatter plots for each dataset combination were evenly distributed along a 1:1 straight line, confirming the model’s validity. The dataset combinations had different estimation accuracies in the following order: ERA5 > GLDAS > CMFD > GLASS, which agreed with the overall performance results of the dataset combinations discussed in Section 4.1. In this study, the uniform surface vegetation cover in RA resulted in higher model performance than in NA. The highest R² reached 0.956, the lowest MAE value was 0.996, and the lowest RMSE was 1.365.

We evaluated the RMSE of each year for different combinations using the RF model and examined the interannual variabilities of the influence of rice planting on model performance. RMSE values were concentrated between the 75% quantile and the upper value for all violin plots (Figure 6). Combined with the pattern of temperature variability, we inferred that this was because of the model error induced by the occurrence of extreme temperatures each year. In RA, different combinations substantially impacted the interannual variability of model performance. The upper and lower bound distributions of errors in the RA violin plot were more compact than those in AA and were especially evident in the ERA5 combination. Conversely, the violin plots were more consistent across datasets for NA. The combination of ERA5 datasets with optimal model performance had the lowest median bounds, ranging from 1.338 °C to 1.304 °C. It was noteworthy that despite the poorer model fit, the CMFD dataset had shorter bounds on the error distribution in RA and exhibited better accuracy than GLDAS. The findings may have been due to using more site data to generate the CMFD dataset.

We examined the seasonal effects of rice planting on Ta estimation by applying the ERA5 combination and RF model to analyze the monthly variabilities (Figure 7). We considered the rice cropping calendar in Guangdong Province [59], where the early rice is submerged and transplanted from early March to late April, grown from mid-April to mid-July, and harvested in July. The late rice is inundated and transplanted from July to mid-August, grows from August to October, and harvests from mid-October to late November. The results showed a significant improvement in the model’s estimation accuracy during rice double cropping. However, the estimation accuracy showed a V-shape with the seasonal advance due to the long high-temperature period at low latitudes, and the accuracy was lowest in July, where R² decreased by 0.172 compared with the highest. Meanwhile, it was easy to find that the model accuracy of RA was lower than that of NA in July and August. Considering the rice planting cycle, besides the higher mean temperature in July and August, this could be attributed to the replacement of double-season rice, which altered the original surface albedo and led to an increase in model error.

We assessed the effect of long-term rice planting on regional Ta by comparing the monthly Ta differences between RA and NA using field observations from meteorological stations in Guangdong Province (Figure 8). The box plots demonstrated that the monthly Ta in RA tended to be lower compared with NA, as indicated by the median line consistently below 0 °C in all plots. However, it is important to note that RA was at risk of extreme high temperatures in the summer, with a maximum temperature difference exceeding 0.5 °C (in July). On the other hand, during winter, the cooling effect of RA became evident, with the minimum temperature differences between November and December, as well as January, exceeding 1.0 °C.

In addition, we selected one meteorological station from each of RA and NA with close latitude and altitude for comparison of temperature differences (Figure 9). Station S59075 (112.5697°N, 24.295°E, altitude 155.2 m) is located in RA, and station S59117 (113.2867°N, 24.7844°E, altitude 116 m) is located in NA. The scatter plot exhibited a distribution of temperature differences close to that of Figure 8. In summer, S59075 in RA showed slightly higher temperatures, with the highest temperatures occurring in August. S59117 in NA had higher temperatures in the winter.

To further explore the factors influencing the regional temperature of rice planting, the relative importance of the predictor variables for the ERA5 combination in RA and NA is shown in Figure 10. TP was the dominant covariate for Ta estimation in both regions, with a more significant influence in RA (maximum relative importance of 21.78%). WD ranked second in NA and strongly influenced 11.42% of rice-planting sites. NDVI and LST, commonly used as temperature estimation variables, also had considerable relative importance (greater than 10%) in this study and were more influential in NA than RA.

5. Discussion

The RF model, a classic model for temperature estimation in remote sensing, exhibited the best model results in this study, with the highest R² value exceeding 0.95. For sea-surface temperature (SST) estimation, the inherent spatial non-linearity and time-dependent nature of marine data enabled LSTM to estimate this property accurately [40]. However, in this study, the LSTM model had a lower performance than the ANN model and improved R² by only 0.01 compared with the MLR model, which may be due to the more complex influencing parameters of Ta than LST [60]. The LSTM model, which is more suitable for characterizing time series estimations, cannot fully capture the spatial properties of some of these parameters. Moreover, Mildrexler et al. indicated that the strength of the relationship between temperature and LST depends on land cover, which may also explain the inferiority of the more accurate LSTM at the sea surface for Ta estimation in the study area [61]. Therefore, although there were statistically solid and spatially significant correlations between LST and Ta, more attention should be paid to the model’s applicability when estimating these parameters. Despite the differences in accuracy between the initial data, the fusion of more highly correlated factors from the multi-source dataset after the unification of accuracy resulted in significant improvements in model estimations. In this study, the best results were obtained for the ERA5 combination, which is most likely because of the superior spatial and temporal resolution of the ERA5 product, considering the initial accuracy of the data products. Therefore, the incorporation of higher-resolution data products is important for the accuracy of temperature estimations in subsequent studies. However, it is important to note that the fused GLDAS reanalysis product had lower spatial and temporal resolutions than ERA5, the GLDAS combination in the LSTM and MLR models showed closer model accuracy to the ERA5 combination than the other models, and the MAE and goodness of fit were better than the theoretically optimal ERA5 combination, probably since LSTM and MLR were more sensitive to the linear relationship between the time series data provided by the GLDAS product.

In previous studies, LST and NDVI were two important parameters for Ta estimation [47,62]. In this study, NDVI significantly influenced long-term Ta modeling, with a maximum relative importance of 25% in the GLASS combination. ET and NIR can partly account for local temperature changes caused by altered biophysical processes during long-term land cover type conversion [63]. In this study, ET and NIR were two of the more essential covariates that affected Ta estimation based on the GLASS data. Temperature determines the distribution and variability of vegetation NPP in humid areas [64,65]. However, the inclusion of NPP as a covariate into the model in this study showed that the relative importance of NPP for temperature estimation was low, only exceeding 10% (11%) in the GLASS combination. Considering the importance of ELE in estimating Ta [66], ELE data was added to the model and found significant importance (15%) in the GLASS combination. Atmospheric circulation is a necessary component of accurate Ta estimation that cannot be ignored [67]. Nastos et al. studied the relationship between mean annual Ta and atmospheric circulation over a wide area of Greece and found an index of atmospheric circulation that is highly correlated with Ta [68]. The response of water temperature to atmospheric circulation along the southern coast of the Baltic Sea was investigated by Girjatowicz et al., and the correlation was found to be highest in winter [69]. In this study, Guangdong Province is located in the coastal region of Southeast Asia. Atmospheric circulation, such as ocean monsoon circulation and typhoons, has a strong influence on temperature monitoring in Guangdong Province and the surrounding coasts [70]. WD, TP, and SP are important factors in describing the influence of atmospheric circulation on Ta [71,72,73,74]. WD powerfully influences the land-air interaction and the transfer of energy, water, and momentum [72]. The strong spatial and temporal correlation between TP and temperature has been explained in many previous studies [14,75]. SP is also an important climate variable in the study of global atmospheric circulation [74]. Among the three covariates provided by the reanalysis product, temperature-related SP had the least impact on the long-term Ta estimation, while TP and WD were more important (>10% relative importance). It is probably attributed to the proximity of the study area to the ocean, where the large amount of water vapor transported by the monsoonal circulation of the ocean affects the estimation of Ta. Meanwhile, the frequent storms in summer and autumn have a great influence on the daily temperature variations in Guangdong Province [76]. Moreover, the ERA5 combination had the highest model accuracy and the largest share of relative importance (40%). It is especially noteworthy that the relative importance of TP in the ERA5 combination was greater than 22%. Previous studies had shown that the pattern of precipitation is correlated with seasons and spatial variations in atmospheric circulation, and the precipitation variables provided by ERA5 had a better fit and higher correlation with the observed data [77,78], which may have been one reason why the ERA5 combination had the best model performance. Therefore, more weight should be given to the precipitation factor provided by ERA5-land in future covariates of long-time series Ta estimations.

Ta modeling accuracy strongly depends on land cover. A uniform ground cover facilitates a consistent exchange of spatial energy between the ground surface and the overlying atmosphere, thus enhancing model estimation accuracy [79]. The model estimation accuracy in this study was better in RA than in NA, but the model accuracy of RA in July and August in this study was lower than that of NA, which was attributed to the disruption of the original uniform ground cover by the rotation of early and late rice, leading to an increase in model error. At the same time, RA had higher reflectance and lower variability during the winter rice fallow period and had higher model estimation accuracy relative to NA. Cai et al. found that models developed for the non-planting plant season outperformed the planting season models, which was the same as our model results [79]. We observed that the frequent high and low temperature fluctuations during the rice growing season at low latitudes in this study could have led to a decrease in model accuracy. Previous studies have reported a cooling effect of rice fields and irrigation on regional Ta in China’s middle and high latitudes [52,53]. Our research showed that Ta was also generally lower in RA than in NA at low latitudes in Guangdong Province. There was less variability in Ta during the rice growing periods, indicating that rice planting at low latitudes has a thermoregulatory effect. However, during the summer months, when the rice was mainly grown, Ta in RA had the potential to exceed that in NA (especially in July), possibly due to the stronger warming effects present in summer and autumn at lower latitudes exacerbating greenhouse gas emissions from RA [7,10]. The cooling effect of rice fields at low latitudes should be considered with caution in future studies due to the possible high Ta in the summer. A comparative correlation analysis found that ELE had a negligible contribution to the long-term Ta estimation. Still, its relative importance was much higher in RA than in NA. This could be attributed to the complex geography of NA, such as urban agglomerations that create a greenhouse effect. RA, however, was more spatially homogeneous and thus more sensitive to elevation changes.

6. Conclusions

This study applied several ML models to four data combinations that incorporated multiple of the same variables from different source datasets to assess the performance of Ta estimation modeling and to compare the relative importance of each covariate. The main findings were that: (1) The integration of multivariate remote sensing data with data products and reanalysis datasets improved the estimation accuracy of the model, and the highest performance was achieved with the combination of the RF model and the ERA5 combination, with the highest R² reaching 0.956, the lowest MAE value of 0.996 °C, and the lowest RMSE of 1.365 °C. (2) TP, WD, NDVI, and LST were key covariates for long-term Ta modeling. TP in the ERA5 combination had the most significant influence among all covariates and a more substantial impact on RA, which should be given more attention in future studies. ELE was an important covariate that affected model estimation in RA. (3) Rice planting generally improved the model performance for Ta estimation, but model accuracy declined during the crop rotation in the summer. Our study also investigated the effects of low-latitude rice planting on Ta and their model estimations. It can be found that long-term rice planting at low latitudes had a cooling effect on regional temperatures and reduced the variability in temperature differences during the rice planting season. However, the potential risks of high temperatures in the rice fields at low latitudes in the summer should also be considered. This study provides a reference for the selection of model and dataset variables for Ta estimation. It is a case study for subsequent ML research on the temperature estimation of agricultural remote sensing and the impacts of agricultural crop cultivation on global warming.

Author Contributions

C.X. and M.L.: conceptualization, methodology, validation; M.L.: Data curation, writing—original draft preparation, formal analysis, investigation; Q.F.: software, data curation, writing—reviewing and editing; J.X.: supervision, visualization, funding acquisition, writing—reviewing and editing; C.X.: project administration, funding acquisition, resources, writing—reviewing and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Natural Science Foundation of China (42171400), the Natural Resources of Guangdong (No. [2023]-25), the Natural Science Foundation of Guangdong Province of China (2021A1515011324), and the Henan Institute of Sun Yat-sen University (No. 2021-006).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

AA	All areas of Guangdong Province
Adam	Adaptive moment estimation
ANN	Artificial neural network
CMDC	China Meteorological Data Service Center
CMFD	China meteorological forcing dataset
CNN	Convolutional neural network
DEM	Digital Elevation Model
DL	Deep learning
EC-LUE	Eddy covariance-light use efficiency
ECMWF	European Centre for Medium range Weather Forecasts
ELE	Elevation
ERA5	European Centre for Medium range Weather Forecasts Reanalysis v5
ET	Evapotranspiration
GLASS	Global land surface satellite
GLDAS	Global land data assimilation system
HEG	Hdf-Eos to GIS conversion tool
LST	Land surface temperature
LSTM	Long short-term memory
MAE	Mean Absolute Error
ML	Machine learning
MLR	Multiple linear regression
MODIS	Moderate Resolution Imaging Spectroradiometer
NA	Non-rice planting
NDVI	Normalized difference vegetation index
NIR	Black-sky albedo in Near Infrared band
NPP	Net primary production
SP	Surface pressure
R²	Correlation of determination
RA	Rice planting areas
RF	Random forest
RMSE	Root Mean Square Error
RNN	Recurrent neural network
SRTM	Shuttle Radar Topography Mission
SST	Sea surface temperature
Ta	Air temperature
TP	Total precipitation
WD	Wind speed

References

Hoegh-Guldberg, O.; Jacob, D.; Taylor, M.; Guillén Bolaños, T.; Bindi, M.; Brown, S.; Camilloni, I.A.; Diedhiou, A.; Djalante, R.; Ebi, K.; et al. The Human Imperative of Stabilizing Global Climate Change at 1.5 °C. Science 2019, 365, eaaw6974. [Google Scholar] [CrossRef] [Green Version]
Zhu, X.; Duan, S.; Li, Z.; Wu, P.; Wu, H.; Zhao, W.; Qian, Y. Reconstruction of land surface temperature under cloudy conditions from Landsat 8 data using annual temperature cycle model. Remote Sens. Environ. 2022, 281, 113261. [Google Scholar] [CrossRef]
Bazrafshan, J. Effect of Air Temperature on Historical Trend of Long-Term Droughts in Different Climates of Iran. Water Resour. Manag. 2017, 31, 4683–4698. [Google Scholar] [CrossRef]
Duan, Z.; Tuo, Y.; Liu, J.; Gao, H.; Song, X.; Zhang, Z.; Yang, L.; Mekonnen, D.F. Hydrological Evaluation of Open-Access Precipitation and Air Temperature Datasets Using SWAT in a Poorly Gauged Basin in Ethiopia. J. Hydrol. 2019, 569, 612–626. [Google Scholar] [CrossRef] [Green Version]
Sun, T.; Sun, R.; Chen, L. The Trend Inconsistency between Land Surface Temperature and Near Surface Air Temperature in Assessing Urban Heat Island Effects. Remote Sens. 2020, 12, 1271. [Google Scholar] [CrossRef] [Green Version]
Asseng, S.; Ewert, F.; Martre, P.; Rötter, R.P.; Lobell, D.B.; Cammarano, D.; Kimball, B.A.; Ottman, M.J.; Wall, G.W.; White, J.W.; et al. Rising Temperatures Reduce Global Wheat Production. Nat. Clim. Chang. 2015, 5, 143–147. [Google Scholar] [CrossRef]
Wang, H.; Yang, T.; Chen, J.; Bell, S.M.; Wu, S.; Jiang, Y.; Sun, Y.; Zeng, Y.; Zeng, Y.; Pan, X.; et al. Effects of Free-Air Temperature Increase on Grain Yield and Greenhouse Gas Emissions in a Double Rice Cropping System. Field Crops Res. 2022, 281, 108489. [Google Scholar] [CrossRef]
Yang, Z.; Zhang, Z.; Zhang, T.; Fahad, S.; Cui, K.; Nie, L.; Peng, S.; Huang, J. The Effect of Season-Long Temperature Increases on Rice Cultivars Grown in the Central and Southern Regions of China. Front. Plant Sci. 2017, 8, 1908. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Zhao, C.; Liu, B.; Piao, S.; Wang, X.; Lobell, D.B.; Huang, Y.; Huang, M.; Yao, Y.; Bassu, S.; Ciais, P.; et al. Temperature Increase Reduces Global Yields of Major Crops in Four Independent Estimates. Proc. Natl. Acad. Sci. USA 2017, 114, 9326–9331. [Google Scholar] [CrossRef]
Wang, L.; Hu, F.; Hu, J.; Chen, C.; Liu, X.; Zhang, D.; Chen, T.; Miao, Y.; Zhang, L. Multistage Spatiotemporal Variability of Temperature Extremes over South China from 1961 to 2018. Theor. Appl. Climatol. 2021, 146, 243–256. [Google Scholar] [CrossRef]
Xu, Y.; Qin, Z.; Shen, Y. Study on the Estimation of Near-Surface Air Temperature from MODIS Data by Statistical Methods. Int. J. Remote Sens. 2012, 33, 7629–7643. [Google Scholar] [CrossRef]
Bostan, P.A.; Heuvelink, G.B.M.; Akyürek, S.Z. Comparison of regression and kriging techniques for mapping the average annual precipitation of Turkey. Int. J. Appl. Earth Obs. Geoinf. 2012, 115–126. [Google Scholar] [CrossRef]
McVicar, T.R.; Jupp, D.L. Using covariates to spatially interpolate moisture availability in the Murray–Darling Basin: A novel use of remotely sensed data. Remote Sens. Environ. 2002, 79, 199–212. [Google Scholar] [CrossRef]
Price, D.T.; McKenney, D.W.; Nalder, I.A.; Hutchinson, M.F.; Kesteven, J. A comparison of two statistical methods for spatial interpolation of Canadian monthly mean climate data. Agric. For. Meteorol. 2000, 101, 81–94. [Google Scholar] [CrossRef]
Chen, Y.; Liang, S.; Ma, H.; Li, B.; He, T.; Wang, Q. An All-Sky 1 km Daily Land Surface Air Temperature Product over Mainland China for 2003–2019 from MODIS and Ancillary Data. Earth Syst. Sci. Data 2021, 13, 4241–4261. [Google Scholar] [CrossRef]
Noi, P.T.; Degener, J.; Kappas, M. Comparison of Multiple Linear Regression, Cubist Regression, and Random Forest Algorithms to Estimate Daily Air Surface Temperature from Dynamic Combinations of MODIS LST Data. Remote Sens. 2017, 9, 398. [Google Scholar] [CrossRef] [Green Version]
Qin, J.; He, M.; Jiang, H.; Lu, N. Reconstruction of 60-Year (1961–2020) Surface Air Temperature on the Tibetan Plateau by Fusing MODIS and ERA5 Temperatures. Sci. Total Environ. 2022, 853, 158406. [Google Scholar] [CrossRef]
Xia, Y.; Hao, Z.; Shi, C.; Li, Y.; Meng, J.; Xu, T.; Wu, X.; Zhang, B. Regional and Global Land Data Assimilation Systems: Innovations, Challenges, and Prospects. J. Meteorol. Res. 2019, 33, 159–189. [Google Scholar] [CrossRef]
Zeng, L.; Hu, Y.; Wang, R.; Zhang, X.; Peng, G.; Huang, Z.; Zhou, G.; Xiang, D.; Meng, R.; Wu, W.; et al. 8-Day and Daily Maximum and Minimum Air Temperature Estimation via Machine Learning Method on a Climate Zone to Global Scale. Remote Sens. 2021, 13, 2355. [Google Scholar] [CrossRef]
Zheng, M.; Zhang, J.; Wang, J.; Yang, S.; Han, J.; Hassan, T. Reconstruction of 0.05° All-Sky Daily Maximum Air Temperature across Eurasia for 2003–2018 with Multi-Source Satellite Data and Machine Learning Models. Atmos. Res. 2022, 279, 106398. [Google Scholar] [CrossRef]
Beck, H.E.; Pan, M.; Miralles, D.G.; Reichle, R.H.; Dorigo, W.A.; Hahn, S.; Sheffield, J.; Karthikeyan, L.; Balsamo, G.; Parinussa, R.M.; et al. Evaluation of 18 Satellite- and Model-Based Soil Moisture Products Using in Situ Measurements from 826 Sensors. Hydrol. Earth Syst. Sci. 2021, 25, 17–40. [Google Scholar] [CrossRef]
Guo, B.; Xu, T.; Yang, Q.; Zhang, J.; Dai, Z.; Deng, Y.; Zou, J. Multiple Spatial and Temporal Scales Evaluation of Eight Satellite Precipitation Products in a Mountainous Catchment of South China. Remote Sens. 2023, 15, 1373. [Google Scholar] [CrossRef]
Liu, H.; Xin, X.; Su, Z.; Zeng, Y.; Lian, T.; Li, L.; Yu, S.; Zhang, H. Intercomparison and Evaluation of Ten Global ET Products at Site and Basin Scales. J. Hydrol. 2023, 617, 128887. [Google Scholar] [CrossRef]
Huang, X.; Han, S.; Shi, C. Evaluation of Three Air Temperature Reanalysis Datasets in the Alpine Region of the Qinghai–Tibet Plateau. Remote Sens. 2022, 14, 4447. [Google Scholar] [CrossRef]
Wang, M.; He, C.; Zhang, Z.; Hu, T.; Duan, S.-B.; Mallick, K.; Li, H.; Liu, X. Evaluation of Three Land Surface Temperature Products from Landsat Series Using in Situ Measurements. IEEE Trans. Geosci. Remote Sens. 2023, 61, 22508960. [Google Scholar] [CrossRef]
Huang, X.; Han, S.; Shi, C. Multiscale Assessments of Three Reanalysis Temperature Data Systems over China. Agriculture 2021, 11, 1292. [Google Scholar] [CrossRef]
Zhao, T.; Fu, C. Applicability Evaluation of Surface Air Temperature from Several Reanalysis Datasets in China. Plateau Meteorol. 2009, 28, 594–606. [Google Scholar] [CrossRef]
Wang, C.; Bi, X.; Luan, Q.; Li, Z. Estimation of Daily and Instantaneous Near-Surface Air Temperature from MODIS Data Using Machine Learning Methods in the Jingjinji Area of China. Remote Sens. 2022, 14, 1916. [Google Scholar] [CrossRef]
Meyer, H.; Katurji, M.; Appelhans, T.; Müller, M.U.; Nauss, T.; Roudier, P.; Zawar-Reza, P. Mapping Daily Air Temperature for Antarctica Based on MODIS LST. Remote Sens. 2016, 8, 732. [Google Scholar] [CrossRef] [Green Version]
Otgonbayar, M.; Atzberger, C.; Mattiuzzi, M.; Erdenedalai, A. Estimation of Climatologies of Average Monthly Air Temperature over Mongolia Using MODIS Land Surface Temperature (LST) Time Series and Machine Learning Techniques. Remote Sens. 2019, 11, 2588. [Google Scholar] [CrossRef] [Green Version]
Vulova, S.; Meier, F.; Fenner, D.; Nouri, H.; Kleinschmit, B. Summer Nights in Berlin, Germany: Modeling Air Temperature Spatially with Remote Sensing, Crowdsourced Weather Data, and Machine Learning. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 5074–5087. [Google Scholar] [CrossRef]
Xu, Y.; Knudby, A.; Ho, H.C. Estimating Daily Maximum Air Temperature from MODIS in British Columbia, Canada. Int. J. Remote Sens. 2014, 35, 8108–8121. [Google Scholar] [CrossRef]
Yoo, C.; Im, J.; Park, S.; Quackenbush, L.J. Estimation of Daily Maximum and Minimum Air Temperatures in Urban Landscapes Using MODIS Time Series Satellite Data. ISPRS J. Photogramm. Remote Sens. 2018, 137, 149–162. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
Liu, Y.; Ortega-Farías, S.; Tian, F.; Wang, S.; Li, S. Estimation of Surface and Near-Surface Air Temperatures in Arid Northwest China Using Landsat Satellite Images. Front. Environ. Sci. 2021, 9. [Google Scholar] [CrossRef]
Jia, H.; Yang, D.; Deng, W.; Wei, Q.; Jiang, W. Predicting Land Surface Temperature with Geographically Weighed Regression and Deep Learning. WIREs Data Min. Knowl. Discov. 2021, 11, e1396. [Google Scholar] [CrossRef]
Kartal, S.; Sekertekin, A. Prediction of MODIS Land Surface Temperature Using New Hybrid Models Based on Spatial Interpolation Techniques and Deep Learning Models. Environ. Sci. Pollut. Res. 2022, 29, 67115–67134. [Google Scholar] [CrossRef] [PubMed]
Khalil, U.; Azam, U.; Aslam, B.; Ullah, I.; Tariq, A.; Li, Q.; Lu, L. Developing a Spatiotemporal Model to Forecast Land Surface Temperature: A Way Forward for Better Town Planning. Sustainability 2022, 14, 11873. [Google Scholar] [CrossRef]
Mukonza, S.S.; Chiang, J.-L. Micro-Climate Computed Machine and Deep Learning Models for Prediction of Surface Water Temperature Using Satellite Data in Mundan Water Reservoir. Water 2022, 14, 2935. [Google Scholar] [CrossRef]
Su, H.; Zhang, T.; Lin, M.; Lu, W.; Yan, X.-H. Predicting Subsurface Thermohaline Structure from Remote Sensing Data Based on Long Short-Term Memory Neural Networks. Remote Sens. Environ. 2021, 260, 112465. [Google Scholar] [CrossRef]
Chung, J.; Lee, Y.; Jang, W.; Lee, S.; Kim, S. Correlation Analysis between Air Temperature and MODIS Land Surface Temperature and Prediction of Air Temperature Using TensorFlow Long Short-Term Memory for the Period of Occurrence of Cold and Heat Waves. Remote Sens. 2020, 12, 3231. [Google Scholar] [CrossRef]
Yang, D.; Zhong, S.; Mei, X.; Ye, X.; Niu, F.; Zhong, W. A Comparative Study of Several Popular Models for Near-Land Surface Air Temperature Estimation. Remote Sens. 2023, 15, 1136. [Google Scholar] [CrossRef]
Che, J.; Ding, M.; Zhang, Q.; Wang, Y.; Sun, W.; Wang, Y.; Wang, L.; Huai, B. Reconstruction of Near-Surface Air Temperature over the Greenland Ice Sheet Based on MODIS Data and Machine Learning Approaches. Remote Sens. 2022, 14, 5775. [Google Scholar] [CrossRef]
Shwetha, H.R.; Kumar, D.N. Prediction of High Spatio-Temporal Resolution Land Surface Temperature under Cloudy Conditions Using Microwave Vegetation Index and ANN. ISPRS J. Photogramm. Remote Sens. 2016, 117, 40–55. [Google Scholar] [CrossRef]
Wang, J.; Deng, Z. Development of MODIS Data-Based Algorithm for Retrieving Sea Surface Temperature in Coastal Waters. Environ. Monit. Assess 2017, 189, 286. [Google Scholar] [CrossRef]
Wu, W.; Tang, X.-P.; Guo, N.-J.; Yang, C.; Liu, H.-B.; Shang, Y.-F. Spatiotemporal Modeling of Monthly Soil Temperature Using Artificial Neural Networks. Theor. Appl. Climatol. 2013, 113, 481–494. [Google Scholar] [CrossRef]
Runke, W.; Xiaoni, Y.; Yaya, S.; Chengyong, W.; Baokang, L. Study on Air Temperature Estimation and Its Influencing Factors in a Complex Mountainous Area. PLoS ONE 2022, 17, e0272946. [Google Scholar] [CrossRef]
Şahin, M. Modelling of Air Temperature Using Remote Sensing and Artificial Neural Network in Turkey. Adv. Space Res. 2012, 50, 973–985. [Google Scholar] [CrossRef]
Chiueh, Y.-W.; Tan, C.-H.; Hsu, H.-Y. The Value of a Decrease in Temperature by One Degree Celsius of the Regional Microclimate—The Cooling Effect of the Paddy Field. Atmosphere 2021, 12, 353. [Google Scholar] [CrossRef]
Gogoi, P.P.; Vinoj, V.; Swain, D.; Dash, J.; Tripathy, S. Land use and land cover change effect on surface temperature over Eastern India. Sci. Rep. 2019, 9, 8859. [Google Scholar] [CrossRef] [Green Version]
Chen, X.; Gu, X.; Liu, P.; Wang, D.; Mumtaz, F.; Shi, S.; Liu, Q.; Zhan, Y. Impacts of Inter-Annual Cropland Changes on Land Surface Temperature Based on Multi-Temporal Thermal Infrared Images. Infrared Phys. Technol. 2022, 122, 104081. [Google Scholar] [CrossRef]
Xin, B.; Yu, L.; Li, G.; Jiao, Y.; Liu, T.; Zhang, S.; Lei, Z. Impact of Saline-Alkali Land Greening on the Local Surface Temperature—A Multiscale Assessment Based on Remote Sensing. Remote Sens. 2022, 14, 4246. [Google Scholar] [CrossRef]
Liu, T.; Yu, L.; Zhang, S. Land Surface Temperature Response to Irrigated Paddy Field Expansion: A Case Study of Semi-arid Western Jilin Province, China. Sci. Rep. 2019, 9, 5278. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Zhou, D.; Li, D.; Sun, G.; Zhang, L.; Liu, Y.; Hao, L. Contrasting effects of urbanization and agriculture on surface temperature in eastern China. J. Geophys. Res. Atmos. 2016, 121, 9597–9606. [Google Scholar] [CrossRef] [Green Version]
Shen, H.; Jiang, Y.; Li, T.; Cheng, Q.; Zeng, C.; Zhang, L. Deep Learning-Based Air Temperature Mapping by Fusing Remote Sensing, Station, Simulation and Socioeconomic Data. Remote Sens. Environ. 2020, 240, 111692. [Google Scholar] [CrossRef] [Green Version]
Livingston, F.J. Implementation of Breiman’s Random Forest Machine Learning Algorithm. ECE591Q Mach. Learn. J. Pap. 2005, 1–13. [Google Scholar]
Yuan, Q.; Shen, H.; Li, T.; Li, Z.; Li, S.; Jiang, Y.; Xu, H.; Tan, W.; Yang, Q.; Wang, J.; et al. Deep Learning in Environmental Remote Sensing: Achievements and Challenges. Remote Sens. Environ. 2020, 241, 111716. [Google Scholar] [CrossRef]
Du, B.; Mao, K.; Bateni, S.M.; Meng, F.; Wang, X.-M.; Guo, Z.; Jun, C.; Du, G. A Novel Fully Coupled Physical–Statistical–Deep Learning Method for Retrieving Near-Surface Air Temperature from Multisource Data. Remote Sens. 2022, 14, 5812. [Google Scholar] [CrossRef]
Pan, B.; Zheng, Y.; Shen, R.; Ye, T.; Zhao, W.; Dong, J.; Ma, H.; Yuan, W. High Resolution Distribution Dataset of Double-Season Paddy Rice in China. Remote Sens. 2021, 13, 4609. [Google Scholar] [CrossRef]
Amani-Beni, M.; Chen, Y.; Vasileva, M.; Zhang, B.; Xie, G. Quantitative-Spatial Relationships between Air and Surface Temperature, a Proxy for Microclimate Studies in Fine-Scale Intra-Urban Areas? Sustain. Cities Soc. 2022, 77, 103584. [Google Scholar] [CrossRef]
Mildrexler, D.J.; Yang, Z.; Cohen, W.B.; Bell, D.M. A forest vulnerability index based on drought and high temperatures. Remote Sens. Environ. 2016, 173, 314–325. [Google Scholar] [CrossRef] [Green Version]
Bird, D.N.; Banzhaf, E.; Knopp, J.; Wu, W.; Jones, L. Combining Spatial and Temporal Data to Create a Fine-Resolution Daily Urban Air Temperature Product from Remote Sensing Land Surface Temperature (LST) Data. Atmosphere 2022, 13, 1152. [Google Scholar] [CrossRef]
Zhang, Y.; Liang, S. Impacts of Land Cover Transitions on Surface Temperature in China Based on Satellite Observations. Environ. Res. Lett. 2018, 13, 024010. [Google Scholar] [CrossRef]
Cui, J.; Wang, Y.; Zhou, T.; Jiang, L.; Qi, Q. Temperature Mediates the Dynamic of MODIS NPP in Alpine Grassland on the Tibetan Plateau, 2001–2019. Remote Sens. 2022, 14, 2401. [Google Scholar] [CrossRef]
Sun, J.; Du, W. Effects of Precipitation and Temperature on Net Primary Productivity and Precipitation Use Efficiency across China’s Grasslands. GIScience Remote Sens. 2017, 54, 881–897. [Google Scholar] [CrossRef]
Zhang, H.; Immerzeel, W.W.; Zhang, F.; de Kok, R.J.; Gorrie, S.J.; Ye, M. Creating 1-Km Long-Term (1980–2014) Daily Average Air Temperatures over the Tibetan Plateau by Integrating Eight Types of Reanalysis and Land Data Assimilation Products Downscaled with MODIS-Estimated Temperature Lapse Rates Based on Machine Learning. Int. J. Appl. Earth Obs. Geoinf. 2021, 97, 102295. [Google Scholar] [CrossRef]
Loginov, S.V.; Ippolitov, I.I.; Kharyutkina, E. The relationship of surface air temperature, heat balance at the surface, and radiative balance at the top of atmosphere over the Asian territory of Russia using reanalysis and remote-sensing data. Int. J. Remote Sens. 2014, 35, 5878–5898. [Google Scholar] [CrossRef]
Girjatowicz, J.P.; Świątek, M. Effects of atmospheric circulation on water temperature along the southern Baltic Sea coast. Oceanologia 2019, 61, 38–49. [Google Scholar] [CrossRef]
Nastos, P.T.; Philandras, C.M.; Founda, D.; Zerefos, C.S. Air temperature trends related to changes in atmospheric circulation in the wider area of Greece. Int. J. Remote Sens. 2011, 32, 737–750. [Google Scholar] [CrossRef]
Ma, C.; Zhao, J.; Ai, B.; Sun, S. Two-Decade Variability of Sea Surface Temperature and Chlorophyll-a in the Northern South China Sea as Revealed by Reconstructed Cloud-Free Satellite Data. IEEE Trans. Geosci. Remote Sens. 2021, 59, 9033–9046. [Google Scholar] [CrossRef]
Wang, A.; Zeng, X. Evaluation of multireanalysis products with in situ observations over the Tibetan Plateau. J. Geophys. Res. 2012, 117, D05102. [Google Scholar] [CrossRef]
Yang, K.; Guo, X.; He, J.; Qin, J.; Koike, T. On the Climatology and Trend of the Atmospheric Heat Source over the Tibetan Plateau: An Experiments-Supported Revisit. J. Clim. 2011, 24, 1525–1541. [Google Scholar] [CrossRef] [Green Version]
Strajnar, B.; Cedilnik, J.; Fettich, A.; Ličer, M.; Pristov, N.; Smerkol, P.; Jerman, J. Impact of two-way coupling and sea-surface temperature on precipitation forecasts in regional atmosphere and ocean models. Q. J. R. Meteorol. Soc. 2019, 145, 228–242. [Google Scholar] [CrossRef] [Green Version]
Yadav, R.K. Role of equatorial central Pacific and northwest of North Atlantic 2-metre surface temperatures in modulating Indian summer monsoon variability. Clim. Dyn. 2009, 32, 549–563. [Google Scholar] [CrossRef]
Sun, X.; Xie, L.; Semazzi, F.H.; Liu, B. Effect of Lake Surface Temperature on the Spatial Distribution and Intensity of the Precipitation over the Lake Victoria Basin. Mon. Weather. Rev. 2015, 143, 1179–1192. [Google Scholar] [CrossRef]
Pan, C.; Wang, X.; Liu, L.; Wang, D.; Huang, H. Characteristics of Heavy Storms and the Scaling Relation with Air Temperature by Event Process-Based Analysis in South China. Water 2019, 11, 185. [Google Scholar] [CrossRef] [Green Version]
Steinkopf, J.; Engelbrecht, F. Verification of ERA5 and ERA-Interim precipitation over Africa at intra-annual and interannual timescales. Atmos. Res. 2022, 280, 106427. [Google Scholar] [CrossRef]
Lim Kam Sian, K.T.; Dosio, A.; Ayugi, B.O.; Hagan, D.F.; Kebacho, L.L.; Ongoma, V. Dominant modes of precipitation over Africa, and their associated atmospheric circulations from observations. Int. J. Climatol. 2023. [Google Scholar] [CrossRef]
Cai, Y.; Chen, G.; Wang, Y.; Yang, L. Impacts of Land Cover and Seasonal Variation on Maximum Air Temperature Estimation Using MODIS Imagery. Remote Sens. 2017, 9, 233. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Distribution of RA and meteorological stations in Guangdong Province.

Figure 2. Ta estimation model framework for three areas combining multi-source datasets with different ML models.

Figure 3. Taylor plots of modeling results for different ML models and datasets. (A). GLASS combination. (B). CMFD combination. (C). GLDAS combination. (D). ERA5 combination.

Figure 4. Relative importance ratios for different dataset sources. (A). Relative importance partition of the GLASS combination. (B). Relative importance partition of the CMFD combination. (C). Relative importance partition of the GLDAS combination. (D). Relative importance partition of the ERA5 combination.

Figure 5. Plots predicted versus in situ observed Ta point densities from 10-fold cross-validation results. The blue line is the linear regression of the scattered points; the red line is the 1:1 line.

Figure 6. Interannual RMSE variability using RF model for each dataset combination. Annual RMSE values for each model over seven years are shown as violin plots. The red lines indicate the quartiles; the top red line is the 75% quartile, and the bottom red line is the 25% quartile. The black dashed line represents the median.

Figure 7. Monthly RF model performance of RA and NA under the ERA5 combination.

Figure 8. Box plots of temperature differences between RA and NA.

Figure 9. Scatter plot of the monthly temperature difference between RA meteorological station S59075 and NA meteorological station S59117.

Figure 10. The relative importance of impact factors from the ERA5 combination in RA and NA.

Table 1. Variables used for model estimation from 2012 to 2018.

Data Sources	Abbreviation	Data	Spatial Resolution	Temporal Resolution
In situ site	Ta	Air temperature	/	Daily
MODIS	NDVI	Normalized difference vegetation index	500 m	Daily
MODIS	LST	Land surface temperature	1000 m	Daily
GLASS	NIR	Black-sky albedo in the near-infrared band	1000 m	8-day
	ET	Evapotranspiration	1000 m	8-day
	NPP	Net primary production	500 m	8-day
ERA5	SP	Surface pressure	0.1°	hourly
	TP	Total precipitation	0.1°	hourly
	WD	10 m u/v component of wind	0.1°	hourly
GLDAS	SP	Psurf_f_inst	0.25°	3-h
	TP	Rainf_f_tavg	0.25°	3-h
	WD	Wind_f_inst	0.25°	3-h
CMFD	SP	Pressure	0.1°	Daily
	TP	Precipitation rate	0.1°	Daily
	WD	Wind speed	0.1°	Daily

Table 2. Model performance for models combined with different datasets.

	Indicators	GLASS	CMFD	GLDAS	ERA5
RF	R²	0.938	0.946	0.948	0.951
	MAE	1.157	1.080	1.067	1.023
	RMSE	1.582	1.486	1.469	1.414
ANN	R²	0.868	0.882	0.888	0.892
	MAE	1.753	1.664	1.622	1.600
	RMSE	2.301	2.191	2.130	2.105
LSTM	R²	0.846	0.849	0.854	0.854
	MAE	1.906	1.890	1.887	1.888
	RMSE	2.486	2.461	2.457	2.450
MLR	R²	0.838	0.844	0.847	0.845
	MAE	1.974	1.950	1.948	1.949
	RMSE	2.544	2.514	2.512	2.510

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lin, M.; Fang, Q.; Xia, J.; Xu, C. Air Temperature Monitoring over Low Latitude Rice Planting Areas: Combining Remote Sensing, Model Assimilation, and Machine Learning Techniques. Remote Sens. 2023, 15, 3805. https://doi.org/10.3390/rs15153805

AMA Style

Lin M, Fang Q, Xia J, Xu C. Air Temperature Monitoring over Low Latitude Rice Planting Areas: Combining Remote Sensing, Model Assimilation, and Machine Learning Techniques. Remote Sensing. 2023; 15(15):3805. https://doi.org/10.3390/rs15153805

Chicago/Turabian Style

Lin, Minghao, Qiang Fang, Jizhe Xia, and Chenyang Xu. 2023. "Air Temperature Monitoring over Low Latitude Rice Planting Areas: Combining Remote Sensing, Model Assimilation, and Machine Learning Techniques" Remote Sensing 15, no. 15: 3805. https://doi.org/10.3390/rs15153805

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Air Temperature Monitoring over Low Latitude Rice Planting Areas: Combining Remote Sensing, Model Assimilation, and Machine Learning Techniques

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Datasets

2.2.1. In Situ Observation Data

2.2.2. DEM Data

2.2.3. Satellite Observation Data

2.2.4. Data Products and Reanalysis Datasets

3. Methodology

3.1. Data Preprocessing

3.2. Modeling

3.2.1. RF Model

3.2.2. ANN Model

3.2.3. LSTM Model

3.2.4. MLR Model

3.3. Model Training and Validation

4. Result

4.1. Model Performance Evaluation

4.2. Relative Importance of Covariates

4.3. Effects of Rice Planting on Modeling and Temperature Variation

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI