Estimation of the All-Wave All-Sky Land Surface Daily Net Radiation at Mid-Low Latitudes from MODIS Data Based on ERA5 Constraints

Li, Shaopeng; Jiang, Bo; Peng, Jianghai; Liang, Hui; Han, Jiakun; Yao, Yunjun; Zhang, Xiaotong; Cheng, Jie; Zhao, Xiang; Liu, Qiang; Jia, Kun

doi:10.3390/rs14010033

Open AccessArticle

Estimation of the All-Wave All-Sky Land Surface Daily Net Radiation at Mid-Low Latitudes from MODIS Data Based on ERA5 Constraints

by

Shaopeng Li

^1,2,

Bo Jiang

^1,2,*

,

Jianghai Peng

^1,2,

Hui Liang

^1,2

,

Jiakun Han

^1,2,

Yunjun Yao

^1,2,

Xiaotong Zhang

^1,2,

Jie Cheng

^1,2

,

Xiang Zhao

^1,2

,

Qiang Liu

^1,3

and

Kun Jia

^1,2

¹

The State Key Laboratory of Remote Sensing Science, Jointly Sponsored by Beijing Normal University and Institute of Remote Sensing and Digital Earth of Chinese Academy of Sciences, Beijing 100875, China

²

China and Beijing Engineering Research Center for Global Land Remote Sensing Products, Institute of Remote Sensing Science and Engineering, Faculty of Geographical Science, Beijing Normal University, Beijing 100875, China

³

China and Beijing Engineering Research Center for Global Land Remote Sensing Products, College of Global Change and Earth System Science, Beijing Normal University, Beijing 100875, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2022, 14(1), 33; https://doi.org/10.3390/rs14010033

Submission received: 9 November 2021 / Revised: 14 December 2021 / Accepted: 16 December 2021 / Published: 22 December 2021

(This article belongs to the Special Issue Artificial Intelligence in Remote Sensing of Atmospheric Environment)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The surface all-wave net radiation (

R_{n}

) plays an important role in the energy and water cycles, and most studies of

R_{n}

estimations have been conducted using satellite data. As one of the most commonly used satellite data sets, Moderate Resolution Imaging Spectroradiometer (MODIS) data have not been widely used for radiation calculations at mid-low latitudes because of its very low revisit frequency. To improve the daily

R_{n}

estimation at mid-low latitudes with MODIS data, four models, including three models built with random forest (RF) and different temporal expansion models and one model built with the look-up-table (LUT) method, are used based on comprehensive in situ radiation measurements collected from 340 globally distributed sites, MODIS top-of-atmosphere (TOA) data, and the fifth generation of European Centre for Medium-Range Weather Forecasts Reanalysis 5 (ERA5) data from 2000 to 2017. After validation against the in situ measurements, it was found that the RF model based on the constraint of the daily

R_{n}

from ERA5 (an RF-based model with ERA5) performed the best among the four proposed models, with an overall validated root-mean-square error (RMSE) of 21.83 Wm⁻², R² of 0.89, and a bias of 0.2 Wm⁻². It also had the best accuracy compared to four existing products (Global LAnd Surface Satellite Data (GLASS), Clouds and the Earth’s Radiant Energy System Edition 4A (CERES4A), ERA5, and FLUXCOM_RS) across various land cover types and different elevation zones. Further analyses illustrated the effectiveness of the model by introducing the daily

R_{n}

from ERA5 into a “black box” RF-based model for

R_{n}

estimation at the daily scale, which is used as a physical constraint when the available satellite observations are too limited to provide sufficient information (i.e., when the overpass time is less than twice per day) or the sky is overcast. Overall, the newly-proposed RF-based model with ERA5 in this study shows satisfactory performance and has strong potential to be used for long-term accurate daily

R_{n}

global mapping at finer spatial resolutions (e.g., 1 km) at mid-low latitudes.

Keywords:

net radiation; energy balance; mid-low latitude; temporal expansion; modeling; random forest; constraint; MODIS; ERA5

1. Introduction

The land surface all-wave net radiation (

R_{n}

) represents the radiative energy balance at the land surface [1]; it drives the main biophysical processes, such as evapotranspiration and photosynthesis [2,3], and is one of the most essential physical parameters describing interactions between the land and atmosphere. Moreover,

R_{n}

is one of the most important inputs for various land surface process models and parameter calculations (e.g., evapotranspiration <ET>) [4]. Mathematically,

R_{n}

is the sum of the net shortwave and net longwave radiation [1,5]:

R_{n} = R_{n s} + R_{n l}

(1)

R_{n} = R_{s}^{↓} - R_{s}^{↑} + R_{l}^{↓} - R_{l}^{↑}

(2)

R_{n} = (1 - α) R_{s}^{↓} + R_{l}^{↓} - R_{l}^{↑}

(3)

where

R_{n s}

,

R_{n l}

,

R_{s}^{↓}

,

R_{s}^{↑}

,

R_{l}^{↓}

, and

R_{l}^{↑}

are the land surface net shortwave radiation (Wm⁻², where downward is defined as positive), net longwave radiation (Wm⁻²), downward shortwave radiation (Wm⁻²), upward shortwave radiation (Wm⁻²), downward longwave radiation (Wm⁻²), and upward longwave radiation (Wm⁻²), respectively, and α is the land surface broadband albedo.

R_{n}

can be obtained from direct measurements or from the four radiative components shown in Equation (2) measured at ground stations, which are usually taken as “ground truth” values [6,7]. However, because of the limited length of observing periods and limited spatial distributions due to expensive maintenance costs, applications of ground

R_{n}

measurements are very limited. An alternative way to obtain

R_{n}

is from various products, including reanalysis [8] and remotely sensed products [9]. Reanalysis products (e.g., ECMWF Interim Reanalysis from the European Centre for Medium-Range Weather Forecasts [8]) are derived by merging available observations with an atmospheric model [10], and despite their longer temporal spans, they are usually at coarse resolutions and are thought to be less accurate than products obtained from remotely sensed products [11], where the poorer accuracies are possibly caused by inaccurate cloud information [11,12,13]. Remotely sensed products are generated based on satellite data; however, most existing satellite products have coarse spatial resolutions (e.g., CERES4A [9]). Even the latest product with the finest spatial resolution, Global LAnd Surface Satellite Data (GLASS) [14,15], is still 0.05°, and thus their applications are still limited, especially at regional scales. Therefore, more

R_{n}

products with high accuracy and finer spatial resolutions are still required where satellites are the most appropriate data source.

Thus far, many methods have been successfully developed for retrieving surface radiation from satellite data [5,15,16], which either take high-level remotely sensed products (e.g., atmospheric, surface, and cloud parameters) or top of the atmosphere (TOA) satellite observations (e.g., surface reflectance/radiance) as input. By considering error propagation and accumulation, the method of estimating radiation directly from TOA satellite observations is more preferable [15,17]. These kinds of methods can be roughly divided into three classes: radiative transfer models (RTMs) [9,18], hybrid models, and empirical/statistical models. RTMs have clear physical mechanisms and have been used to generate popular radiative products such as CERES4A. However, RTMs are complicated and they have high requirements in terms of, both the quality and quantity of input.

A hybrid model is established by combining simulations from RTMs (i.e., MODTRAN5) with statistical regression methods, making them easy to implement in practical applications; they also have strong physical mechanisms. For example, Wang et al. [19] developed a hybrid model to estimate the clear-sky instantaneous

R_{n}

(

R_{n_i n s}

) directly from combined visible and shortwave infrared (VSWIR) and thermal infrared (TIR) remote sensing data. Their results demonstrated that the new model outperformed traditional component-based models in

R_{n_i n s}

estimation, with a root-mean-square error (RMSE) of 70.6 Wm⁻² at seven sites observed by the Surface Radiation Budget Network (SURFRAD). However, a successful and robust hybrid model highly depends on the comprehensive training samples simulated from the RTM by pre-setting various combinations of surface and atmospheric conditions. Indeed, several scenarios are very difficult to simulate, such as cases with a very large solar zenith angle [19] or cases where cloud structure is taken into account [19]. To improve this, an empirical model was recently successfully developed at high latitudes by Chen et al. [15] in which comprehensive

R_{n}

ground measurements were collected to replace simulated values of

R_{n}

in the hybrid model in order to estimate the daily

R_{n}

(

R_{n_d a i l y}

) from MODIS TOA observations. Then, by considering the influence of the daytime length on

R_{n_d a i l y}

, the length ratio of a daytime (LRD)-based model was built with an artificial neural network (ANN). The model validation results indicated the effectiveness of this new model, which had a similar spatial distribution to that of the CERES4A product but with more details (1 km) and higher accuracy, with an overall RMSE of 23.66 Wm⁻² and 15.04 Wm⁻² when using instantaneous MODIS TOA observations at daytime and nighttime, respectively. However, this model was only built for high latitudes, and more efforts are needed to make it globally applicable. Compared to other satellites, MODIS, a polar-orbiting satellite with 36 bands between 0.405 and 14.385 µm, including 20 visible and shortwave infrared bands (at 250 m and 500 m) and 16 thermal infrared bands (at 1 km), has been preferred for radiation estimation in previous studies, such as for

R_{n_i n s}

[19],

R_{n_d a i l y}

[15], instantaneous

R_{n s}

[17], daily

R_{n s}

[19], etc. Hence, it is feasible to develop a similar empirical model also taking MODIS TOA data as input for

R_{n_d a i l y}

estimation at mid-low latitudes.

However, one of the most essential issues to be addressed is how to obtain

R_{n_d a i l y}

at mid-low latitudes from limited MODIS instantaneous observations. It is known that the revisit frequency of MODIS at high latitudes is ~6 to 20 times per day [15], which is much higher than that at mid-low latitudes, which may only be observed 2–4 times per day for most regions and only once for some tropical regions, which means that the information for characterizing variations in

R_{n}

throughout a single day at mid-low latitudes is very likely inadequate. To address this issue, most previous studies have developed a temporal expansion model, which can include the theoretical sinusoidal model [5,20], empirical models (i.e., quadratic polynomial regression method [21]), and the look-up-table (LUT) method [19]. However, most of these models have to be used with the assumption that the atmospheric and surface conditions remain unchanged throughout the day [19], which is not in accord with reality [22]. Another solution is to add some necessary information from other data sources when making the estimations, such as various parameters from remotely sensed or reanalysis products. For example, Zhang et al. [23] employed the aerosol optical depth (AOD) and cloud optical depth (COD) products from MODIS in a cost function to constrain their retrieving procedure of atmospheric parameters and obtain more accurate instantaneous

R_{s}^{↓}

estimations. Another approach is to simply apply the same parameter estimates from a physical-based model as constraints, such as using estimations from empirical models built with machine-learning methods, which are thought to be “black box” calculation processes lacking any physical basis and which can easily produce output with large uncertainties [24]. For instance, Xu et al. [21] applied the daily solar radiation from Modern-Era Retrospective Analysis for Research and Applications (MERRA) to correct the corresponding values, which were systematically underestimated when using MODIS’s atmospheric products.

Hence, inspired by these studies,

R_{n_d a i l y}

estimates from the newly released long-term fifth generation of European Centre for Medium-Range Weather Forecasts Reanalysis 5 (ERA5) reanalysis product [25] are introduced as a physical constraint in a newly proposed empirical model for

R_{n_d a i l y}

estimation at mid-low latitudes from MODIS TOA data in this study. ERA5 is among the newest generation of reanalysis products, providing high-quality reanalysis of global atmospheric, oceanic, and land-surface fields at hourly time steps with a spatial resolution of 25 km [26]. ERA5 was generated based on the Integrated Forecasting System (IFS), and benefits from a decade of developments in model physics, core dynamics, and data assimilation with the 12 hourly 4DVar using information from a substantial volume of improved observations [27]. Specifically, the radiation components in ERA5 were generated from RTMs which simulate the attenuation caused by various atmosphere and surface conditions [28]; they perform better and with a significantly lower bias than products from the previous versions, ERA-Interim and MERRA version2 (MERRA2), which were validated against ground-based measurements, especially for inland regions [27].

The major purpose of this study is to propose a new empirical model to estimate

R_{n_d a i l y}

at mid-low latitudes directly from MODIS TOA observations by introducing

R_{n_d a i l y}

from ERA5 as a constraint. For comparison, three other

R_{n_d a i l y}

estimation models, one without ERA5

R_{n_d a i l y}

, one with a traditional temporal expansion model, and another based on the LUT method, are also established at mid-low latitudes. Afterwards, all four proposed models are evaluated with the ground measurements and compared with other products providing

R_{n_d a i l y}

.

The organization of the remainder of this paper is as follows. Section 2 introduces the data used in this study and their pre-processing. Section 3 presents the development of the

R_{n_d a i l y}

estimation models. The results of the model validation and comparison along with analysis are given in Section 4. Section 5 and Section 6 provide discussion and conclusions, respectively.

2. Data and Pre-Processing

In this study, in situ radiation measurements collected from more than 300 stations at mid-low latitudes together with MODIS products and ERA5 reanalysis data were used for model development. For comparison,

R_{n_d a i l y}

from four newly released products, namely GLASS [14,15], FLUXCOM [29], CERES4A [9], and ERA5 [25], were used. More details are given below.

2.1. Ground Measurements

As stated above, in situ radiation measurements from 340 global distributed sites at mid-low latitudes (60°S–60°N) in thirteen measuring networks were collected for the period 2000–2017 (see Table 1). Some sites provided

R_{n}

measurements directly, while most sites provided four radiative components (

R_{s}^{↓}

,

R_{s}^{↑}

,

R_{l}^{↓}

, and

R_{l}^{↑}

). Since

R_{s}^{↓}

was needed for model evaluation, this study only used samples from the sites that could provide

R_{n}

and

R_{s}^{↓}

simultaneously. According to the study of Jia et al. [11], the uncertainty of the measured

R_{n}

in most networks at mid-low latitudes is relatively small and can be neglected.

Figure 1 shows the geographical distribution of all sites with various land cover types defined by the International Geosphere–Biosphere Programme (IGBP), and also presents the climate zones defined by the Köppen–Geiger climate classification [39] to which these sites belong. The elevations of the sites range from −7 m to 4698 m above sea level. Therefore, the comprehensive in situ measurements collected in this study, spanning areas within 60°S to 60°N, represent a variety of climatic conditions, geographic environments, and ecosystem conditions. More information can be found in [14].

2.1.1. Radiation Measurements

In order to ensure the quality of the in situ measurements, all the

R_{n}

and

R_{s}^{↓}

measurements were manually checked and further selected, with only measurements labelled as high quality by their releasers being used. In this study,

R_{n}

measurements with both instantaneous and daily scales as well as daily

R_{s}^{↓}

measurements are used. Because the observed frequencies are different for various sites, the scale of “instantaneous” is defined as half an hour, following the study of Wang et al. [19], which means that the average of all available

R_{n}

from fifteen minutes before and after MODIS passes over the pixel of the corresponding site (whose sampling frequency is less than 30 min) is taken as the in situ

R_{n_i n s}

value. The

R_{n_d a i l y}

and

R_{s}^{↓}

values are calculated only if at least one observation is available in each half hour in one day, as performed in the study of Jiang et al. [14]. After matching with the MODIS observations and manually checking and removing any records missing band observations, 164,654 samples at instantaneous scales and 564,967 samples at daily scales were collected, which were then divided into training and validation datasets. To ensure similar distributions of the

R_{n}

characteristics of the two datasets, all instantaneous and daily samples for each site were first sorted by their observing time, then a sample was taken from every five samples as validation and the remaining four were used as training samples; this helps to ensure the same physical range of

R_{n}

for training and validation samples. This procedure was repeated sequentially for each site until all sites were processed. Therefore, 80% of the samples were used for modeling (131,563 instantaneous samples and 452,098 daily samples), and the other 20% for independent validation (33,091 instantaneous samples and 112,869 daily samples).

Furthermore, considering the discrepancy in the spatial resolution of MODIS (1 km) and the ERA5 reanalysis data (25 km), the spatial homogeneity of all sites within a 25 × 25 km² region was examined from three aspects: variations in the land cover type, topography, and vegetation coverage; the results are shown in Figure 2. For land cover (Figure 2a), the proportion of land cover which is the same as the site within a 25 × 25 km² grid was identified according to the MODIS Land Cover Climate Modeling Grid (CMG) (MCD12C1) product, which has a resolution of 5 km (https://lpdaac.usgs.gov/products/mcd12c1v006/, accessed on 30 July 2021), and calculated for each of the 340 sites; then, the proportions of all sites were divided into five ranges (0–20%, 20–40%, 40–60%, 60–80%, and 80–100%). It can be seen in Figure 2a that more than 65% of the samples have a consistency of land cover around the sites of more than 60%. Figure 2b shows the variations in elevation (represented by the standard deviation) within a 25 × 25 km² region of each site; the elevations were provided by the National Aeronautics and Space Administration (NASA) Shuttle Radar Topographic Mission (SRTM), with a spatial resolution of 90 m. Most sites (>75%) had a relatively flat surface, with an elevation standard deviation of less than 20 m. The vegetation coverage of each site is represented by the normalized difference vegetation index (NDVI), obtained from GLASS at a 5 km spatial resolution [40]. Similarly, the variation in NDVI within the 25 × 25 km² region of each site was calculated; the results are shown in Figure 2c. The vegetation coverage within this region is relatively uniform for more than 85% of the sites, which have an NDVI standard deviation less than 0.1. Overall, most sites in this study were homogeneous, with a 25 km spatial size.

2.1.2. Clearness Index Calculation

To evaluate model performance under various sky conditions, the daily clearness index (CI), which is usually used for representing atmosphere transmittance [41], was applied in this study. The daily CI is defined as the ratio of the daily

R_{s}^{↓}

to extraterrestrial radiation (

R_{s e}

):

CI = \frac{R_{s}^{↓}}{R_{s e}}

(4)

In this study, the in situ daily

R_{s}^{↓}

measurements are used for the CI calculations, and

R_{s e}

(Wm⁻²) is calculated using Equation (5) from [42]:

R_{s e} = \frac{1440 G_{s c} d_{r}}{π} (w_{s} \sin (φ) \sin (δ) + \cos (δ) \sin (w_{s}))

(5)

d_{r} = 1 + 0.033 \cos (\frac{2 π d o y}{365})

(6)

δ = 0.409 \sin (\frac{2 π d o y}{365} - 1.39)

(7)

w_{s} = \arccos (- \tan (φ) \tan δ)

(8)

rad = \frac{π}{180} * (d e c i m a l d e g)

(9)

where

G_{s c}

is the solar constant (0.0820 MJm⁻²·min⁻¹),

d_{r}

is the inverse relative distance from the Earth to the Sun,

ω_{s}

is the sunset hour angle (rad),

φ

is the latitude (rad), δ is the solar declination (rad), and

d o y

is the day of the year. In this study, the daily CI at each site is calculated. Additionally,

d_{r}

and the latitude (

φ

, °) of each site are used for model development.

2.2. Remotely Sensed Products

2.2.1. MODIS Products for Modeling

Theoretically, the two MODIS sensors on the Terra and Aqua satellites funded by NASA can provide approximately four overpasses per day for most places globally [5]; the four times are 10:30 am, 1:30 pm, 10:30 pm, and 1:30 am at local time. To provide a better illustration, the average overpass frequency of MODIS during the daytime for a randomly selected day (the 240th day of 2005) at a spatial resolution of 5 km was calculated and is shown in Figure 3; it should be noted that this is the average frequency in a 5 × 5 km grid. It can be seen that the overpass frequency during the daytime for most regions at mid-low latitudes is only 1–2 times a day, which is much less than at high latitudes. In this study, both the MOD and MYD series products provided by Terra and Aqua, respectively, were used, similar to the study of Wang et al. [19], including the TOA reflectance and radiance data from MOD/MYD021KM, the corresponding geolocation and view geometry information from MOD/MYD03, and the cloud mask from MOD/MYD35 from 2000 to 2017. Table 2 lists all of the MODIS products used. All MODIS products were extracted according to the locations of the 340 sites and converted from coordinated universal time (UTC) to local time (LT) to match the in situ measurements. In addition, the Z-score normalization (mean = 0, standard deviation = 3) method was applied to the MODIS TOA observations to exclude any abnormal values before modeling.

2.2.2. Products Providing $R_{n_d a i l y}$ for Comparison

GLASS

The GLASS

R_{n_d a i l y}

product is generated by combining the results from two algorithms. In regions at mid-low latitudes,

R_{n_d a i l y}

is mainly estimated based on the daily

R_{s}^{↓}

and other ancillary information (i.e., meteorological factors) under various underlying surface types and daytime lengths [14]. Jiang et al. [43] evaluated the GLASS daytime

R_{n}

(

R_{n_d a y t i m e}

) product generated with a similar algorithm and concluded it had better performance compared with CERES. Specifically, GLASS had an RMSE of 51.24 Wm⁻² for global sites validation, compared to 54.96 Wm⁻² for CERES. For regions at high latitudes,

R_{n}

is estimated based on the LRD model applied to MODIS TOA data [15]. After weighted fusion, a long-term seamless GLASS

R_{n_d a i l y}

product with a spatial resolution of 0.05° from 2000 to 2018 was produced. Guo et al. [44] applied the GLASS

R_{n_d a i l y}

for ET estimation, which had better accuracy than

R_{n_d a i l y}

obtained from MERRA2. In this study, the GLASS

R_{n_d a i l y}

values were extracted according to the location and observation time of each validation sample for comparison. Specifically, we extracted the GLASS product values at the pixel with the center closest to the longitude and latitude of the sites, and corresponding to the observation time of the sites.

2.: CERES4A

Because of its high accuracy and reliable quality, the CERES4A series of radiation products have been widely used as a reference in previous studies [6,45,46]. The surface radiative fluxes of CERES4A are generated using Fu–Liou’s radiative transfer model in tandem with CERES’s TOA measurements and high-quality meteorological data sets from different reanalysis projects [9,47,48]. CERES4A provides the all-sky

R_{n_d a i l y}

by adding the four radiative components from its Level 3 products at 1° spatial resolution from 2000 to present. The CERES4A

R_{n_d a i l y}

values were extracted according to the location and observation time of each validation sample for comparison, with a specific extraction process consistent with GLASS

3.: FLUXCOM_RS

FLUXCOM is a remote sensing product [29] consisting of two data sets, FLUXCOM_RS and FLUXCOM_RS+METEO, generated using ground measurements, remotely sensed data, and reanalysis data with a machine learning algorithm from 2001 to 2010 and 2001 to 2015, respectively. For FLUXCOM_RS, the fluxes are estimated exclusively from MODIS satellite data; thus, its spatial resolution is 0.0833° and temporal resolution is eight days. For FLXUCOM_RS+METEO, the additional meteorological information from the reanalysis products is added as input; thus, its spatial resolution is 0.5° and its temporal resolution is daily. According to Jung et al. [29], the spatial patterns of the

R_{n_d a i l y}

values from the two FLUXCOM datasets both agree well with the one from CERES4A, yielding an R² of 0.96, and they both show similar large-scale variations. However, the energy fluxes from FLUXCOM_RS perform better in heterogeneous regions than those from FLUXCOM_RS+METEO because of the former’s higher spatial resolution. Two radiative components,

R_{n s}

and

R_{n l}

, at the eight-daily scale are provided by FLUXCOM_RS; hence,

R_{n_d a i l y}

from FLUXCOM_RS was calculated using R_ns and R_nl and then extracted according to the site locations and observation time for comparison with a specific extraction process consistent with GLASS.

2.3. ERA5 Reanalysis Products

ERA5 is the updated version of the previous generation ERA-interim [8], and is the fifth generation of European Centre for Medium-Range Weather Forecasts (ECMWF) atmospheric reanalysis products. It has a higher time resolution of 1 h, a finer spatial resolution of 31 km (0.28125° × 0.28125°), and a higher number of vertical levels of 137. The performance of various parameters from ERA5 is acknowledged and has been widely used [49,50]. The

R_{n_d a i l y}

values from ERA5 are calculated by adding the four radiative components, which have an hourly resolution, and aggregated into daily means. Similarly, the

R_{n_d a i l y}

from ERA5 were also extracted with an extraction process consistent with GLASS. Additional meteorological factors, including the 2 m air temperature (T, °C), surface pressure (SP, Pa), total cloud cover (CC, %), total column cloud liquid water (TCL, gm⁻²), relative humidity (RH, %), and total column water vapor (TCW, gm⁻²) were also extracted at each site at a spatial resolution of 25 km and used as ancillary information in this study. In addition, the CI of ERA5 (

C I_{E R A 5}

) and the ratio of

R_{n_i n s}

and

R_{n_d a i l y}

(

C_{d} = \frac{R_{n_i n s}}{R_{n_d a i l y}}

) (defined according to the study of Carmona et al. [51]), which is used to express the linear statistical relationship between

R_{n_i n s}

and

R_{n_d a i l y}

in a certain region, were calculated and applied in this study.

3. Methodology

A flowchart of this study is shown in Figure 4. First, the optimal combinations of MODIS bands most appropriate for modeling were determined. Then, four empirical models for

R_{n_d a i l y}

estimation at mid-low latitudes were established, according to the statistical relationships between the selected MODIS TOA data and the in situ

R_{n_d a i l y}

and

R_{n_i n s}

values, using the random forest (RF) and LUT methods. Afterwards, the performances of the four new models were evaluated and further analyses conducted with the best one; the results from this model were then compared with other products and preliminarily used for mapping.

3.1. $R_{n_d a i l y}$ Estimation Model Development

Because of the abundant bands and different overpass times of the MODIS observations, the most appropriate combinations of the TOA bands and their overpass time should be determined first for modeling. To do this, correlations between the in situ

R_{n_i n s}

and

R_{n_d a i l y}

values under various conditions were analyzed. Afterwards, by taking the optimal combinations of MODIS TOA data as major input, three RF-based models, namely an RF-based instantaneous model (RF-based ins model), an RF-based model, and an RF-based model with ERA5, and a LUT-based model were built. Specifically, the RF-based ins model was established by regressing the MODIS TOA data to the in situ

R_{n_i n s}

values directly with RF, then the estimated

R_{n_i n s}

values were converted into

R_{n_d a i l y}

values using a temporal expansion method. The RF-based model was established by regressing the MODIS TOA data to the in situ

R_{n_d a i l y}

values directly with RF; the RF-based model with ERA5 was similar to the RF-based model but added the

R_{n_d a i l y}

of ERA5 as input. The LUT-based model contained a group of conditional models built by regressing the MODIS TOA data to the in situ

R_{n_d a i l y}

values under different cases defined by combining the various observation geometric information and sky conditions obtained from the MOD/MYD03 and MOD/MYD35 products. Table 3 shows the detailed information of the four models. Note that an observation from MODIS yields an estimated

R_{n_d a i l y}

values; thus, the final obtained

R_{n_d a i l y}

values are the average of all estimated

R_{n_d a i l y}

in a single day, the values of which are determined by the MODIS observation frequency. To compare the four models objectively, the intersection of all independent validation samples for the four models (No. of samples = 23,826) were used. In this study, all models were implemented on a PC with the Microsoft Windows 10 operating system, an Intel Core 3.20 GHz processor, and 32 GB of memory.

3.1.1. MODIS TOA Data Selection for Modeling

According to previous studies [15], when LRD is greater than around 0.3, instantaneous daytime MODIS observations provide a better estimation of

R_{n_d a i l y}

at high latitudes; however, it is uncertain whether such a relationship holds at mid-low latitudes. Hence, the determination coefficient (

R^{2}

, see Section 3.2 for specific computational definitions) between the in situ

R_{n_i n s}

and its corresponding

R_{n_d a i l y}

at each hour in each day as well as their variations with different factors including latitude, month, and the land cover type were examined using all collected sites measurements, not only those that successfully matched with MODIS; the results are shown in Figure 5. In Figure 5a, all sites and their correlation results are divided into six groups using 10° latitude intervals from 0° to 60°N. Additionally, these correlation results are divided into twelve months periods and sixteen land cover types, as shown in Figure 5b,c. The results indicate that strong correlations (warm colors) between the in situ

R_{n_i n s}

and

R_{n_d a i l y}

values mainly appear from 10:00 to 15:00 (local time) for nearly all land cover types at mid-low latitudes throughout the year. Hence, the MODIS TOA observations during the daytime are used for

R_{n_d a i l y}

estimation at mid-low latitudes, with most samples between 10:00 and 14:00.

Meanwhile, according to previous studies, surface shortwave radiation is closely related to the MODIS TOA reflectance from bands 1–7 [17,19,52], while the longwave radiation is mostly represented by the MODIS TOA radiance from bands 27–36 [53,54,55]; water vapor, which has a significant impact on solar radiation, is related to band 19 [19], and bands 21, 24, and 25 are affected by both solar illumination and the Earth’s infrared emission [19]. Thus, MODIS bands 1–5, 7, 19, 21, 24, 25, and 27–36 were selected for modeling in this study. Note that MODIS band 6 was not used due to its poor quality [56].

Afterwards, the

R_{n_d a i l y}

estimation model was established using all training samples with the following mathematical expression:

R_{n} = f (T O A_{r e f}, T O A_{r a d}, S Z A, V Z A, R A A, h e i g h t, d_{r}, φ)

(10)

where

R_{n}

is

R_{n_i n s}

for the RF-based ins model and

R_{n_d a i l y}

for the RF-based model,

T O A_{r e f}

is the TOA reflectance from MODIS bands 1–5, 7, and 19,

T O A_{r a d}

is the TOA radiance from MODIS bands 21, 24, 25, and 27–36, VAA, SAA, SZA, VZA, and the height are from MOD/MYD03 (Table 2), RAA is the relative azimuth angle calculated by the absolute values of VAA minus SAA, and

d_{r}

is calculated according to Equation (6).

3.1.2. Modeling with Random Forest

RF, proposed by Breiman et al. [57], has been one of the most popular machine learning methods developed thus far. RF is based on the construction of a multitude of regression or classification trees. Each tree is grown using randomly selected features and samples extracted from the whole training set. The samples are then successively split to produce two branches until the terminal nodes of the tree return the priors provided by the user. Specifically, for classification, the results of RF turn out to be the voting results for all decision trees; for regression, the final RF values can be determined by averaging over the ensemble regression values. In addition, the under or over-fitting issues which appear very often in machine learning can be solved by tuning some hyperparameters in RF (i.e., the number of trees in the forest, the maximum depth of each tree, minimum split sample number, and minimum sample leaves) [50]. RF has become a popular technique in remote sensing-related studies such as parameter estimation [17,50,52], parameter prediction [58], classification [59], variable importance determination [22], etc. The performance of RF is determined by the setting of the hyperparameters, such as “n-estimators”, “max depth”, “max features”, “min samples split”, “min sample leaves”, etc. By fixing other hyperparameters, a certain hyperparameter was traversed within a certain range in order to observe the corresponding change in accuracy of the RF model; four hyperparameters were found to be critical in this study, each varying within a certain range (Table 4). Hence, the optimal RF model was determined by the smallest RMSE value of validation accuracy using different combinations of these hyperparameters. In this study, the RF method was implemented in Python using the RandomForestRegressor module in the Scikit-learn toolbox, using a Microsoft Windows 10 system on the Intel Core 3.2 GHz PC with 32 GB memory [60].

RF-based ins model

It is more common to estimate the instantaneous surface radiation from instantaneous satellite observations [5,20,51,61]. Then, the daytime or daily value is usually calculated from a temporal expansion model, such as the sinusoidal curve proposed by Bisit et al. [5], as Equation (11) shows:

R_{n_d a y t i m e} = \frac{N * R_{n_i n s}}{π * \sin (π * \frac{t_{o v e r p a s s} - t_{r i s e}}{t_{s e t} - t_{r i s e}})}

(11)

where N is the empirical coefficient and

t_{o v e r p a s s}

,

t_{r i s e}

, and

t_{s e t}

are the MODIS overpass, sunrise, and sunset times, respectively. In order to determine the optimal N value, a group of experiments with an interval of 0.1 from 1 to 2 were compared, and it was determined that when N was defined as 1.7, the error of the estimated

R_{n_d a y t i m e}

was the smallest. The sunrise and sunset times were calculated following the method of Doggett et al. [62]. Afterwards, the estimated

R_{n_d a y t i m e}

from Equation (11) was converted into

R_{n_d a i l y}

with a statistical model (e.g., a linear or quadratic polynomial regression) built based on the ground measurements. Through experiments, the optimal combination of hyperparameters for the RF-based ins model was found to be 50 for “n-estimators”, 9 for “max depth”, 5 for “min samples split” and 8 for “min samples leaf”.

As mentioned above, the sinusoidal model was applied by assuming that atmospheric conditions are stable throughout each day. Thus, the model was developed based on the assumption that the surface radiation is primarily determined by SZA and then follows a sinusoidal curve throughout the daytime [63]. However, no significant correlation was found between the instantaneous and daily atmospheric conditions on the same day by Wang et al. [22], and cloudy skies are more likely to appear than clear skies, especially in the afternoon [20,64]. Thus, the performance of sinusoidal models will likely deteriorate with mixed cloudy or overcast skies.

2.: RF-based model

To avoid error accumulation occurring during temporal expansion, the RF-based model was built by taking

R_{n_d a i l y}

as the output according to Equation (10), which means that one

R_{n_d a i l y}

estimate was calculated from the MODIS TOA observations, and the final

R_{n_d a i l y}

values in a single day are the average of all available estimates during that day. Through experiments, the optimal combination of hyperparameters for the RF-based model was determined to be 50 for “n-estimators”, 10 for “max depth”, 3 for “min samples split” and 7 for “min samples leaf”.

3.: RF-based model with ERA5

To reduce the uncertainties of the RF-based model, especially when the available MODIS TOA observations are very limited (only 1–2 revisits each day) during the daytime, the surface

R_{n_d a i l y}

from ERA5 was introduced into the RF-based model as a physical constraint with the mathematic expression

R_{n_d a i l y} = f (T O A_{r e f}, T O A_{r a d}, SZA, VZA, RAA, height, d_{r}, φ, R_{n_E R A 5})

(12)

where

R_{n_E R A 5}

is the

R_{n_d a i l y}

from ERA5. Through experiments, the optimal combination of hyperparameters for the RF-based model with ERA5 was 50 for “n-estimators”, 12 for “max depth”, 4 for “min samples split” and 8 for “min samples leaf”.

3.1.3. Look-Up Table (LUT-Based) Model

For comparison,

R_{n_d a i l y}

was also estimated from MODIS TOA observations based on the typical LUT method using ground measurements instead of simulations from RTM. As was done in the study of Wang et al. [19], the viewing geometry (SZA, VZA, and RAA) and sky conditions (determined by the cloud mask) during each MODIS overpass time were taken as the key factors for case division and each of them divided into different bins, as shown in Table 5. Then, a group of conditional models determined by the combination of the above four factors was built by regressing the MODIS TOA data to the

R_{n_d a i l y}

as

R_{n_d a i l y} = f (T O A_{r e f}, T O A_{r a d}, h e i g h t, d_{r}, φ)

(13)

The cloud mask data obtained from MOD/MYD35 indicates the sky condition when the two MODIS sensors overpass, with “clear” defined as when the clear-sky confidence levels are “confident clear” or “probably clear” and “cloudy” defined as when the clear-sky confidence levels are “cloudy” or “uncertain clear” [23,52].

In this study, each conditional model was built with the multilayer perceptron (MLP) method [65], which is one of the basic building blocks of an ANN. MLP has been widely used in radiation estimation because of its superior performance in regression [66,67]. A three-layer network consisting of an input layer, a hidden layer with the rectified linear unit (ReLU) transfer function [68,69], and an output layer with a linear function was constructed for each conditional model. The most optimal model was determined by the minimum validated RMSE after setting with different numbers of neurons in the hidden layer. The MLP was implemented on the Keras interface in Python.

20^{°}, 30^{°}, 40^{°}, 50^{°}, 60^{°}, 70^{°}, 85^{°}

10^{°}, 20^{°}, 30^{°}, 40^{°}, 55^{°}, 70^{°}

30^{°}, 60^{°}, 90^{°}, 120^{°}, 180^{°}

Theoretically, 504 conditional models (7 SZA bins × 6 VZA bins × 6 RAA bins × 2 sky conditions) should be built; however, only 312 models were established (153 for clear-sky and 159 for cloudy-sky) in this work because a minimum of 80 training samples was required. Figure 6 presents the proportion of all samples classified by each bin of the four key factors (Table 5). It can be seen that most of the matched observations have clouds in Figure 6a, and the SZA of most samples is concentrated at 20° to 70° in Figure 6b.

3.2. Model Performance Evaluation

In this study, four statistical indices were used to evaluate model performance: the determination coefficient (R²), RMSE, bias, and relative root mean square error (rRMSE), given respectively as:

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(e_{i} - o_{i})}^{2}}{\sum_{i = 1}^{n} {(o_{i} - \bar{o})}^{2}}

(14)

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(o_{i} - e_{i})}^{2}}

(15)

Bias = \frac{1}{n} \sum_{i = 1}^{n} (e_{i} - o_{i})

(16)

rRMSE = \frac{R M S E}{m e a n (X)}

(17)

where

e_{i}

is the estimated

R_{n}

from the proposed model,

o_{i}

is the corresponding measured

R_{n}

, and

X

represents all site measurements of

R_{n}

.

4. Results and Analysis

4.1. Proposed Model Performance Evaluation

Among the four proposed models, the RF-based ins model was the only one that cannot derive

R_{n_d a i l y}

directly, and instead yielding

R_{n_i n s}

. Figure 7 shows the training and validation accuracy of the RF-based ins model, which yields RMSEs of 78.06 and 81.74 Wm⁻², biases of −0.08 and −0.34 Wm⁻², and R² values of 0.84 and 0.83, respectively. Wang et al. [19] applied a hybrid model to estimate clear-sky

R_{n_i n s}

, and the direct validation accuracy of SURFARD yielded an RMSE of 70.6 Wm⁻² (number of samples = 1522), while a similar validation was also conducted on the all-sky

R_{n_i n s}

estimates from the RF-based ins model and yielded a smaller RMSE of 67.7 Wm⁻² (number of samples = 11,268). Therefore, the results here and in the previous study demonstrate the limitations of using simulated values.

Next, the estimated

R_{n_i n s}

values were aggregated into

R_{n_d a y t i m e}

according to Equation (11). To obtain

R_{n_d a i l y}

from

R_{n_d a y t i m e}

, the relationship between the two parameters was examined first based on the in situ measurements, the results of which are given in Figure 7c. After multiple experiments, a linear regression model (Equation (18)) was built and then applied to convert

R_{n_d a y t i m e}

to

R_{n_d a i l y}

:

R_{n_d a i l y} = 0.58 \times R_{n_d a y t i m e} - 33.5

(18)

For a better comparison, the

R_{n_d a i l y}

estimates from the RF-based ins model were compared with those obtained from the other three models (i.e., the RF-based model, RF-based model with ERA5, and the LUT-based model). Figure 8 presents the overall results of the training and validation accuracies of the four models; the three plots in each row from top to bottom are for the RF-based ins model, RF-based model, RF-based model with ERA5, and LUT-based model, respectively. Specifically, the four plots in the left column (Figure 8a,d,g,j)) are the training results of the four models (note that Figure 8a shows the training results of Equation (18)). The four plots in the middle column (Figure 8b,e,h,k) show the independent validation accuracies of the four models with their corresponding independent validation samples (see Table 3). The four plots in the right column (Figure 8c,f,i,l) present the validation accuracies of the four models against the intersection validation samples for intercomparison.

Generally speaking, the performance of the three models deriving

R_{n_d a i l y}

directly (i.e., the RF-based model, RF-based model with ERA5, and LUT-based model) was similar, and better than that of the RF-based ins model in terms of training, validation, and intercomparison validation results without overfitting. Specifically, the RF-based model with ERA5 performed the best among the four models, with both the independent and intercomparison validation accuracy (Figure 8h,i) yielding the largest R² values of 0.89 and 0.91, the smallest RMSEs of 21.83 and 18.86 Wm⁻², and biases of 0.20 and 0.28 Wm⁻². These results also demonstrate that the performance of the RF-based model could be effectively improved by introducing the ERA5

R_{n_d a i l y}

values, which results in a reduction in the values of the RMSE of 1.04 and 1.07 Wm⁻² and in bias of 0.05 and 0.08 Wm⁻² (Figure 8e,f). Relatively, the performance of the LUT-based model was only slightly worse than that of the RF-based model in terms of the independent and intercomparison validation accuracies, with R² values of 0.87 and 0.90, RMSEs of 23.71 and 20.28 Wm⁻², and biases of 0.09 and 0.56 Wm⁻² (Figure 8k,l), respectively, which might have resulted from the insufficient number of samples collected under some conditions for modeling. The performance of the RF-based ins model was the worst among the four proposed models. It is speculated that most samples of the cloudy sky used in this study deteriorated the performance of the sinusoidal model used to convert

R_{n_i n s}

to

R_{n_d a y t i m e}

, which further resulted in errors propagating from

R_{n_d a y t i m e}

to

R_{n_d a i l y}

.

Based on these results, it can be concluded that (1) the application of the RF-based model is likely limited by the limited number of overpass times of polar-orbiting satellites at mid-low latitudes, even though performance in the respect was acceptable in this study; (2) the RF-based model with ERA5 can effectively improve estimation accuracy based on the physical constraints imposed by

R_{n_d a i l y}

from ERA5; (3) the sinusoidal curve model performs undesirably when sky conditions are not clear; and (4) the LUT-based method might perform uncertainly in some cases even though it had comparable performance here. Because of its good performance, further analysis with the RF-based model with ERA5 was conducted, and is discussed in Section 4.2.

4.2. Further Analysis with RF-Based Model with ERA5

A more comprehensive evaluation was conducted with the RF-based model with ERA5 by comparing its

R_{n_d a i l y}

values with those obtained from other products (GLASS, CERES4A, ERA5 and FLUXCOM_RS) under various conditions.

4.2.1. Comparison with Other Products at the Site Scale

First, the spatial distribution of the independent validation accuracy in the

R_{n_d a i l y}

estimations from the RF-based model with ERA5 at all sites is shown in Figure 9. It can be seen that the uncertainties for the

R_{n_d a i l y}

estimations over the regions between 30° and 60° latitude (30°N/S–60°N/S) are smaller than those near the equator (30°S–30°N), with RMSE values mostly between 15 to 25 W·m⁻² (Figure 9a). Meanwhile, the distribution of rRMSE (Figure 9b), which could eliminate the influence of the sample size on the RMSE, is generally uniform globally, with values mostly between 0.15 and 0.35. In addition, the distribution of the bias values (Figure 9c) is erratic, although mostly within ±10 W·m⁻², and no significant overestimation or underestimation can be found over all regions at mid-low latitudes. These results demonstrate the robustness of the RF-based model with ERA5, as shown by the relatively uniform distribution of random errors and systematic errors over the globe.

Meanwhile, the

R_{n_d a i l y}

values from the four products (GLASS, CERES4A, ERA5, and FLUXCOM_RS) were also validated against the ground measurements at mid-low latitudes. Because of the discrepancies in temporal resolution (8-day for FLUXCOM_RS) and span, the number of validation samples used for product intercomparison was 8380. Figure 10b–e present the validation results for each of the four products, and Figure 10a shows them for the RF-based model with ERA5. Based on Figure 10, the

R_{n_d a i l y}

estimates from the RF-based model with ERA5 (Figure 10a) have the best validation accuracies, with the smallest RMSE of 22.97 Wm⁻² and the largest R² of 0.88, while those of GLASS (Figure 10b) are second best, with an RMSE of 26.04 Wm⁻². The performance of the other three products (CERES4A, ERA5, and FLUXCOM_RS) is slightly worse although they are close to each other, with RMSE values of ~30 Wm⁻². Moreover, it is encouraging to see that the

R_{n_d a i l y}

values estimated by the RF-based model with ERA5 are more accurate than the ones from ERA5, as demonstrated by the former having a ~7 Wm⁻² smaller RMSE. In addition, overestimations of

R_{n_d a i l y}

by CERES4A were also found in this study, in accordance with the results of previous studies [6,11].

Furthermore, the performances of the five

R_{n_d a i l y}

estimates for nine land cover types (croplands <CRO>, grasslands <GRA>, deciduous broadleaf forests <DBF>, mixed forests <MF>, open shrublands <OSH>, evergreen broadleaf forests <EBF>, evergreen needleleaf forests <ENF>, woody savannas <WSA> and closed shrublands <CSH>) and six ranges of elevation (<200 m, 200–400 m, 400–600 m, 600–1000 m, 1000–1500 m, and >1500 m) were compared against their respective validation samples; the results represented by RMSEs are shown in Figure 11. The RF-based model with ERA5 performed the best across the nine land cover types and six elevation zones, approaching the smallest RMSE values; the GLASS product again performed second best. In particular, almost all products performed poorly for the EBF and DBF land cover classes (Figure 11a), in keeping with the results of previous studies [14,17]. However, the results from the newly proposed model are satisfactory and are significantly improved compared to that of ERA5, with the RMSE reduced by ~5 W·m⁻². It has been pointed out that surface elevation is one of the most crucial factors in surface radiation estimation because of its impact on controlling atmospheric mass [70]. The performance of the RF-based model with ERA5 appears to be stable and remains optimal at all elevation groups (Figure 11b), with the smallest RMSE; it is much better than even that of ERA5, especially at high elevation zones (>1500 m). Therefore, all the above results demonstrate the effectiveness of the newly proposed model (RF-based model with ERA5) in both robustness and in accuracy.

To examine the model’s ability in variation characterization, the long term

R_{n_d a i l y}

values from the RF-based model with ERA5 and GLASS at three SURFRAD sites (SF_DRA <36.63°N, 116.02°W, BSV>, SF_FPK <48.31°N, 105.1°W, GRA>, and SF_SXF <43.73°N, 96.62°W, OSH>) were taken as examples, as shown in Figure 12. The two

R_{n_d a i l y}

estimates both captured the variations in the in situ

R_{n_d a i l y}

(black dots) very well; however, those from the RF-based model with ERA5 (red line) were better than those of GLASS (blue line), with smaller RMSE values for all three sites. The smaller RMSEs may possibly be due to the

R_{n_d a i l y}

values from RF-based model with ERA5 being closer to the in situ measurements, especially at very high values. However, the two data sets both had a tendency to overestimate

R_{n_d a i l y}

, with low values at the three sites.

In summary, the estimated

R_{n_d a i l y}

from the newly proposed RF-based model with ERA5 has superior performance to the other four products in terms of direct validation accuracy, robustness, and temporal variation characterization.

4.2.2. $R_{n_d a i l y}$ Mapping Using the RF-Based Model with ERA5

Figure 13a shows a further application of the RF-based model with ERA5 to real MODIS images and ERA5

R_{n_d a i l y}

data to map

R_{n_d a i l y}

at mid-low latitudes on the 220th day of 2017. The spatial distribution in

R_{n_d a i l y}

on this day was roughly commensurate with the high values appearing in the Northern Hemisphere during summer and the low values in the Southern Hemisphere during winter. For further illustration, magnified images over a smaller region (24°–36°N, 75°–93°E) from the RF-based model with ERA5, GLASS, ERA5, and CERES4A are presented in Figure 13b–e. The result from the RF-based model with ERA5 is comparable to those of the others, and more similar to that of ERA5, although with more details thanks to its higher spatial resolution. It seems that the distribution of

R_{n_d a i l y}

from GLASS (Figure 13c) is unsmooth in this region, and has higher values compared to the RF-based model with ERA5 (Figure 13b). A significant difference in the four images appears in the northern part in the region (33°–36°N, 81°–87°E), where GLASS has lower values (blue color) than the other three, possibly caused by the presence of clouds. More verification should be conducted in the future.

Overall, the newly proposed model (RF-based model with ERA5) has strong potential to be used for high-resolution global

R_{n_d a i l y}

mapping at mid-low latitudes owing to its stable performance, easily-collected input, and high production efficiency.

5. Discussion of the RF-Based Model with ERA5

More discussion about the development of the RF-based model with ERA5 is provided in this section. Based on all the validation results after incorporating the

R_{n_d a i l y}

from ERA5 as input, (which could supplement situations when insufficient information from instantaneous satellite observations is lacking), the superior performance of the RF-based model was fully demonstrated. However, it is also common to take various parameters, especially meteorological variables from other data sources, as ancillary information in modeling, as employed in previous studies [14,71]. Hence, another four experiments were conducted with the RF-based model by introducing different combinations of climatic and atmospheric parameters from ERA5. Then, the estimated

R_{n_d a i l y}

values were validated and compared with the ones from the RF-based model and the RF-based model with ERA5. Explanations of the variables in Table 6 were introduced in Section 2.3, with most selected according to previous studies. The RF-based model is labelled “Original” for convenience, and detailed information about all the experiments and validation results is given in Table 6. Meanwhile, to evaluate the improved accuracy of the RF-based model with ERA5 compared to ERA5, we also evaluated the accuracy of ERA5 using the same samples in Table 6, generating an RMSE of 29.1 Wm⁻² and a bias of −1.32 Wm⁻².

The RF-based model with ERA5 (the last row in Table 6) performed the best among all the models, with the smallest RMSE of 21.83 Wm⁻². It is speculated that the radiative components simulated in ERA5 by the RTM (in which radiative attenuation is determined via complex atmospheric and surface parameters as well as various assimilated meteorological conditions) contains much more information than the single or combined parameters. Moreover, when taking the ERA5

R_{n_d a i l y}

as a constraint in the RF-based model, the RF method can adaptively adjust the weights between the spectral information provided by MODIS and the features of the ERA5

R_{n_d a i l y}

product at different overpass times to obtain the most optimal estimations by minimizing their differences to the ground measurements.

In addition, as the most important input, the influence of the MODIS TOA observations on the performance of the RF-based model with ERA5 was analyzed in terms of the overpass time, sensor, revisit frequency, and sky condition. Based on the independent validation samples, the accuracy of the estimated

R_{n_d a i l y}

from the RF-based model with ERA5 under various cases were compared. Figure 14 presents the performance of the RF-based model with ERA5 as a function of overpass time and latitude. The overpass time is divided into five bins from 9:00 am to 14:00 pm in one-hour intervals, and the latitude from 0°–60°N is also divided into six using 10° intervals. Note that the

R_{n_d a i l y}

discussed here is the direct output of the new model. The validation rRMSEs are mainly smaller than 0.3 over most regions below 50°N. The RMSE values are larger over the equatorial regions, from 0° to 20°N, which may be due to the lower number of samples collected for this area. It also seems that the new model has no special requirements regarding the overpass time of the MODIS TOA data during the daytime.

Moreover, there is only a slight difference between the results when using MODIS TOA data from only the Terra or Aqua satellite, as shown in Figure 15. It can be seen that the new model yields better performance when the MODIS TOA data in the morning are added (Figure 15a), which accords with the results of the study of Wang et al. [19], likely due to the influence of clouds [64].

However, the influence of the amount of available MODIS TOA data on the performance of the RF-based model with ERA5 is significant. Table 7 gives the validation accuracy of the final

R_{n_d a i l y}

when using one to three MODIS TOA data points per day as input for the RF-based model and RF-based model with ERA5. For the RF-based model, its estimation accuracy gets better when the amount of MODIS TOA observations increases during the daytime, which again illustrates the importance of including information about variations in the daily condition for

R_{n_d a i l y}

estimation. Meanwhile, for the RF-based model with ERA5, its estimation accuracy improves the most when only one MODIS TOA observation per day is used as input, where the RMSE value decreases from 24.17 Wm⁻² to 22.41 Wm⁻² relative to the RF-based model; however, the improvements are smaller with an increase in MODIS TOA data. Therefore, it is recommended to introduce the

R_{n_d a i l y}

values from ERA5 into any RF-based model when the number of MODIS TOA observations is fewer than two per day.

At last, the daily sky condition represented by CI was used to examine the effects of

R_{n_d a i l y}

from ERA5 in the newly-proposed model. The CI is divided into seven bins (<0.2, 0.2–0.3, 0.3–0.4, 0.4–0.5, 0.5–0.6, 0.6–0.7, and >0.7), and the validation accuracy of the final

R_{n_d a i l y}

estimation in each bin is shown in Figure 16. The sky conditions are defined as overcast (CI < 0.3), clear (CI > 0.7), and cloudy (CI ∈ [0.3, 0.7]). Furthermore, the differences in the rRMSEs of the RF-based model with or without ERA5 as a function of the number of MODIS TOA observations used as input for the different CI bins are also examined in Figure 17. The two results demonstrate that the introduction of

R_{n_d a i l y}

from ERA5 into the RF-based model is more effective under overcast skies (CI < 0.3) and when only one MODIS TOA observation is available as input.

Overall, it is very effective to take the ERA5

R_{n}

as a physical constraint in the RF-based model for

R_{n_d a i l y}

estimation at mid-low latitudes, and it is most effective when the available number of the MODIS TOA observations in a single day is fewer than two and the daily sky condition is mainly overcast.

6. Conclusions

R_{n}

is a key parameter representing the available radiation budget at the land surface. As it also regulates most biological and physical processes, an accurate estimation of

R_{n}

is of great significance for various applications. To improve the

R_{n}

estimation accuracy using a limited amount of polar-orbiting satellite data at mid-low latitudes, we proposed to estimate

R_{n_d a i l y}

at mid-low latitudes directly from MODIS TOA data along with comprehensive in situ measurements collected from globally distributed 340 sites from 2000–2017, three RF-based models (named the RF-based ins model, RF-based model, and RF-based model with ERA5), and one LUT based model (named the LUT-based model),. Specifically, the RF-based ins model only calculated

R_{n_i n s}

, which was then converted into

R_{n_d a i l y}

using a temporal expansion model (i.e., the theoretical sinusoidal curve and linear regression model), while the RF-based model and LUT-based model delivered

R_{n_d a i l y}

directly. The RF-based model with ERA5 was developed by introducing

R_{n_d a i l y}

from ERA5 as a physical constraint into the RF-based model. After validation against ground measurements, the RF-based model with ERA5 performed the best among the four proposed models, yielding an overall validated R² of 0.89, RMSE of 21.83 Wm⁻², and bias of 0.2 Wm⁻².

Further evaluation was conducted on the

R_{n_d a i l y}

estimated from the RF-based model with ERA5 by comparing them with corresponding values from four existing products (GLASS, CERES4, ERA5, and FLUXCOM_RS) under various conditions. The results demonstrate the superior performance of the newly proposed model in terms of high accuracy (R² = 0.88, RMSE = 22.97 Wm⁻², and bias = 1.2 Wm⁻²) and robustness across various land cover types and elevation zones. Moreover, this newly proposed model has great potential for mapping long term global

R_{n_d a i l y}

at mid-low latitudes at finer resolution, which could provide more detail and be more useful for practical applications.

Based on this study, all the results illustrate the effectiveness of introducing a physical constraint (here,

R_{n_d a i l y}

from ERA5) into machine-learning RF-based models in order to reduce the uncertainty in

R_{n_d a i l y}

estimations when the available satellite observations are insufficient to characterize the variations in

R_{n}

at the daily resolution, especially when the MODIS overpass frequency is less than twice a day or for overcast days. Compared to the traditional temporal expansion model, which is applied with the assumption that the atmospheric condition remains unchanged for an entire day, this proposed model is more reasonable as it does not make any such assumptions. Therefore, the proposed model has strong potential to be used for regional studies at a 1 km spatial resolution at mid-low latitudes.

However, several issues have not been fully considered with respect to this proposed RF-based model with ERA5, even though it achieved a satisfactory result at mid-low latitudes. For instance, spatially adjacent effects such as horizontal photon transport due to cloud heterogeneity and rapid changes of cloud on

R_{n_i n s}

[72], as well as the effect of topography, which has different influences on direct radiation, diffuse radiation, and irradiance obstructed and reflected by nearby terrain, were not considered. Efforts to address these issues are currently underway.

Author Contributions

S.L. and B.J. wrote the paper; B.J. conceived and designed the experiments; S.L. performed the experiments and results analysis; J.P., H.L. and J.H. processed data and provided valuable suggestions; Y.Y., X.Z. (Xiaotong Zhang), J.C., X.Z. (Xiang Zhao), Q.L. and K.J. supervised the writing. All authors provided edits to the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the National Natural Science Foundation of China (Grant No. 42090012 and 41971291) and National Key Research and Development Program of China (2020YFA0608704).

Data Availability Statement

Data available on request.

Acknowledgments

We are thankful for the covariance data acquired by the FLUXNET community, in particular by the following networks: AmeriFlux (U.S. Department of Energy, Biological and Environmental Research, Terrestrial Carbon Program (DE-FG02-04ER63917)), AfriFlux, AsiaFlux, CarboAfrica, CarboEuropeIP, CarboItaly, CarboMont, ChinaFlux, Fluxnet- Canada (supported by CFCAS, NSERC, BIOCAP, Environment Canada, and NRCan), GreenGrass, KoFlux, LBA, NECC, OzFlux, TCOS-Siberia, USCCC. The authors acknowledge the financial support to the eddy covariance data harmonization provided by CarboEuropeIP, FAO-GTOS-TCO, Ileaps, Max Planck Institute for Biogeochemistry, National Science Foundation, University of Tuscia, Université Laval, Environment Canada and US Department of Energy and the datasets development and technical support from Berkeley Water Center, Lawrence Berkeley National Laboratory, Microsoft Research Science, Oak Ridge National Laboratory, University of California—Berkeley and the University of Virginia. The authors would also like to thank other radiation measurements providers (listed in Table 1). Acknowledgement for the data support from “National Earth System Science Data Center, National Science and Technology Infrastructure of China” (http://www.geodata.cn, accessed on 30 July 2021). Finally, the authors would like to thank the anonymous reviewers for helping to improve the quality of this manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

Liang, S.; Wang, D.; He, T.; Yu, Y. Remote sensing of earth’s energy budget: Synthesis and review. Int. J. Digit. Earth 2019, 12, 737–780. [Google Scholar] [CrossRef] [Green Version]
Alados, I.; Foyo-Moreno, I.; Olmo, F.; Alados-Arboledas, L. Relationship between net radiation and solar radiation for semi-arid shrub-land. Agric. For. Meteorol. 2003, 116, 221–227. [Google Scholar] [CrossRef]
Kalthoff, N.; Fiebig-Wittmaack, M.; Meißner, C.; Kohler, M.; Uriarte, M.; Bischoff-Gauß, I.; Gonzales, E. The energy balance, evapo-transpiration and nocturnal dew deposition of an arid valley in the andes. J. Arid Environ. 2006, 65, 420–443. [Google Scholar] [CrossRef]
Dickinson, R.E.; Oleson, K.W.; Bonan, G.; Hoffman, F.; Thornton, P.; Vertenstein, M.; Yang, Z.-L.; Zeng, X. The community land model and its climate statistics as a component of the community climate system model. J. Clim. 2006, 19, 2302–2324. [Google Scholar] [CrossRef]
Bisht, G.; Venturini, V.; Islam, S.; Jiang, L. Estimation of the net radiation using modis (moderate resolution imaging spectroradiometer) data for clear sky days. Remote Sens. Environ. 2005, 97, 52–67. [Google Scholar] [CrossRef]
Jia, A.; Jiang, B.; Liang, S.; Zhang, X.; Ma, H. Validation and spatiotemporal analysis of ceres surface net radiation product. Remote Sensi. 2016, 8, 90. [Google Scholar] [CrossRef] [Green Version]
Shi, Q.; Liang, S. Characterizing the surface radiation budget over the tibetan plateau with ground-measured, reanalysis, and remote sensing data sets: 1. Methodology. J. Geophys. Res. Atmos. 2013, 118, 9642–9657. [Google Scholar] [CrossRef]
Dee, D.P.; Uppala, S.M.; Simmons, A.J.; Berrisford, P.; Poli, P.; Kobayashi, S.; Andrae, U.; Balmaseda, M.A.; Balsamo, G.; Bauer, P.; et al. The era-interim reanalysis: Configuration and performance of the data assimilation system. Q. J. R. Meteorol. Soc. 2011, 137, 553–597. [Google Scholar] [CrossRef]
Kratz, D.P.; Gupta, S.K.; Wilber, A.C.; Sothcott, V.E. Validation of the ceres edition-4a surface-only flux algorithms. J. Appl. Meteorol. Climatol. 2020, 59, 281–295. [Google Scholar] [CrossRef]
Decker, M.; Brunke, M.A.; Wang, Z.; Sakaguchi, K.; Zeng, X.; Bosilovich, M.G. Evaluation of the reanalysis products from gsfc, ncep, and ecmwf using flux tower observations. J. Clim. 2012, 25, 1916–1944. [Google Scholar] [CrossRef]
Jia, A.; Liang, S.; Jiang, B.; Zhang, X.; Wang, G. Comprehensive assessment of global surface net radiation products and uncertainty analysis. J. Geophys. Res. Atmos. 2018, 123, 1970–1989. [Google Scholar] [CrossRef]
Wild, M.; Folini, D.; Schär, C.; Loeb, N.; Dutton, E.; König-Langlo, G. The global energy balance from a surface perspective. Clim. Dyn. 2012, 40, 3107–3134. [Google Scholar] [CrossRef] [Green Version]
Trenberth, K.; Fasullo, J. Simulation of present-day and twenty-first-century energy budgets of the southern oceans. J. Clim. 2010, 23, 440–454. [Google Scholar] [CrossRef]
Jiang, B.; Zhang, Y.; Liang, S.; Wohlfahrt, G.; Arain, A.; Cescatti, A.; Georgiadis, T.; Jia, K.; Kiely, G.; Lund, M.; et al. Empirical estimation of daytime net radiation from shortwave radiation and ancillary information. Agric. For. Meteorol. 2015, 211–212, 23–36. [Google Scholar] [CrossRef] [Green Version]
Chen, J.; He, T.; Jiang, B.; Liang, S. Estimation of all-sky all-wave daily net radiation at high latitudes from modis data. Remote Sens. Environ. 2020, 245, 111842. [Google Scholar] [CrossRef]
Wang, D.; Liang, S.; He, T.; Shi, Q. Estimating clear-sky all-wave net radiation from combined visible and shortwave infrared (vswir) and thermal infrared (tir) remote sensing data. Remote Sens. Environ. 2015, 167, 31–39. [Google Scholar] [CrossRef]
Wu, H.; Ying, W. Benchmarking machine learning algorithms for instantaneous net surface shortwave radiation retrieval using remote sensing data. Remote Sens. 2019, 11, 2520. [Google Scholar] [CrossRef] [Green Version]
Loeb, N.G.; Doelling, D.R.; Wang, H.; Su, W.; Nguyen, C.; Corbett, J.G.; Liang, L.; Mitrescu, C.; Rose, F.G.; Kato, S. Clouds and the earth’s radiant energy system (ceres) energy balanced and filled (ebaf) top-of-atmosphere (toa) edition-4.0 data product. J. Clim. 2018, 31, 895–918. [Google Scholar] [CrossRef]
Wang, D.; Liang, S.; He, T.; Shi, Q. Estimation of daily surface shortwave net radiation from the combined modis data. IEEE Trans. Geosci. Remote Sens. 2015, 53, 5519–5529. [Google Scholar] [CrossRef]
Bisht, G.; Bras, R.L. Estimation of net radiation from the modis data under all sky conditions: Southern great plains case study. Remote Sens. Environ. 2010, 114, 1522–1534. [Google Scholar] [CrossRef]
Xu, X.; Du, H.; Zhou, G.; Mao, F.; Li, P.; Fan, W.; Zhu, D. A method for daily global solar radiation estimation from two instantaneous values using modis atmospheric products. Energy 2016, 111, 117–125. [Google Scholar] [CrossRef]
Silber, I.; Verlinde, J.; Wang, S.-H.; Bromwich, D.H.; Fridlind, A.M.; Cadeddu, M.; Eloranta, E.W.; Flynn, C.J. Cloud influence on era5 and amps surface downwelling longwave radiation biases in west antarctica. J. Clim. 2019, 32, 7935–7949. [Google Scholar] [CrossRef] [Green Version]
Zhang, Y.; He, T.; Liang, S.; Wang, D.; Yu, Y. Estimation of all-sky instantaneous surface incident shortwave radiation from moderate resolution imaging spectroradiometer data using optimization method. Remote Sens. Environ. 2018, 209, 468–479. [Google Scholar] [CrossRef]
Yuan, Q.; Shen, H.; Li, T.; Li, Z.; Li, S.; Jiang, Y.; Xu, H.; Tan, W.; Yang, Q.; Wang, J.; et al. Deep learning in environmental remote sensing: Achievements and challenges. Remote Sens. Environ. 2020, 241, 111716. [Google Scholar] [CrossRef]
Hersbach, H.; Bell, B.; Berrisford, P.; Hirahara, S.; Horányi, A.; Muñoz-Sabater, J.; Nicolas, J.; Peubey, C.; Radu, R.; Schepers, D.; et al. The era5 global reanalysis. Q. J. R. Meteorol. Soc. 2020, 146, 1999–2049. [Google Scholar] [CrossRef]
Martens, B.; Schumacher, D.L.; Wouters, H.; Muñoz-Sabater, J.; Verhoest, N.E.C.; Miralles, D.G. Evaluating the land-surface energy partitioning in era5. Geosci. Model Dev. 2020, 13, 4159–4181. [Google Scholar] [CrossRef]
Urraca, R.; Huld, T.; Gracia-Amillo, A.; Martinez-de-Pison, F.J.; Kaspar, F.; Sanz-Garcia, A. Evaluation of global horizontal irradiance estimates from era5 and cosmo-rea6 reanalyses using ground and satellite-based data. Sol. Energy 2018, 164, 339–354. [Google Scholar] [CrossRef]
Babar, B.; Graversen, R.; Boström, T. Solar radiation estimation at high latitudes: Assessment of the cmsaf databases, asr and era5. Solar Energy 2019, 182, 397–411. [Google Scholar] [CrossRef]
Jung, M.; Koirala, S.; Weber, U.; Ichii, K.; Gans, F.; Camps-Valls, G.; Papale, D.; Schwalm, C.; Tramontana, G.; Reichstein, M. The fluxcom ensemble of global land-atmosphere energy fluxes. Sci. Data 2019, 6, 74. [Google Scholar] [CrossRef] [Green Version]
Phillips, T.J.; Klein, S.A.; Ma, H.Y.; Tang, Q.; Xie, S.; Williams, I.N.; Santanello, J.A.; Cook, D.R.; Torn, M.S. Using arm observations to evaluate climate model simulations of land-atmosphere coupling on the us southern great plains. J. Geophys. Res. Atmos. 2017, 122, 524–548. [Google Scholar] [CrossRef]
Zo, I.-S.; Jee, J.-B.; Kim, B.-Y.; Lee, K.-T. Baseline surface radiation network (bsrn) quality control of solar radiation data on the gangneung-wonju national university radiation station. Asia-Pac. J. Atmos. Sci. 2017, 53, 11–19. [Google Scholar] [CrossRef]
Yao, Y.; Liang, S.; Li, X.; Hong, Y.; Fisher, J.B.; Zhang, N.; Chen, J.; Cheng, J.; Zhao, S.; Zhang, X. Bayesian multimodel estimation of global terrestrial latent heat flux from eddy covariance, meteorological, and satellite observations. J. Geophys. Res. Atmos. 2014, 119, 4521–4545. [Google Scholar] [CrossRef]
Fu, B.; Li, S.; Yu, X.; Yang, P.; Yu, G.; Feng, R.; Zhuang, X. Chinese ecosystem research network: Progress and perspectives. Ecol. Complex. 2010, 7, 225–233. [Google Scholar] [CrossRef]
Xin, L.; Shaomin, L.; Mingguo, M.; Qing, X.; Qinhuo, L.; Rui, J.; Tao, C.; Weizhen, W.; Yuan, Q.; Hongyi, L. Hiwater: An integrated remote sensing experiment on hydrological and ecological processes in the heihe river basin. Adv. Earth Sci. 2012, 27, 481–498. [Google Scholar]
Yi, C.; Ricciuto, D.; Li, R.; Wolbeck, J.; Xu, X.; Nilsson, M.; Aires, L.; Albertson, J.D.; Ammann, C.; Arain, M.A. Climate control of terrestrial carbon exchange across biomes and continents. Environ. Res. Lett. 2010, 5, 034007. [Google Scholar] [CrossRef]
Tóta, J.; Fitzjarrald, D.R.; Staebler, R.M.; Sakai, R.K.; Moraes, O.M.; Acevedo, O.C.; Wofsy, S.C.; Manzi, A.O. Amazon rain forest subcanopy flow and the carbon budget: Santarém lba-eco site. J. Geophys. Res. Biogeosci. 2008, 113. [Google Scholar] [CrossRef]
Swap, R.J.; Annegarn, H.J.; Suttles, J.T.; King, M.D.; Platnick, S.; Privette, J.L.; Scholes, R.J. Africa burning: A thematic analysis of the southern african regional science initiative (safari 2000). J. Geophys. Res. Atmos. 2003, 108. [Google Scholar] [CrossRef]
Augustine, J.A.; DeLuisi, J.J.; Long, C.N. Surfrad–a national surface radiation budget network for atmospheric research. Bull. Am. Meteorol. Soc. 2000, 81, 2341–2358. [Google Scholar] [CrossRef] [Green Version]
Peel, M.C.; Finlayson, B.L.; McMahon, T.A. Updated world map of the köppen-geiger climate classification. Hydrol. Earth Syst. Sci. 2007, 11, 1633–1644. [Google Scholar] [CrossRef] [Green Version]
Xiao, Z.; Liang, S.; Tian, X.; Jia, K.; Yao, Y.; Jiang, B. Reconstruction of long-term temporally continuous ndvi and surface reflectance from avhrr data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 5551–5568. [Google Scholar] [CrossRef]
Iziomon, M.G.; Mayer, H.; Matzarakis, A. Empirical models for estimating net radiative flux: A case study for three mid-latitude sites with orographic variability. Astrophys. Space Sci. 2000, 273, 313–330. [Google Scholar] [CrossRef]
Irmak, S.; Irmak, A.; Jones, J.; Howell, T.; Jacobs, J.; Allen, R.G.; Hoogenboom, G. Predicting daily net radiation using minimum climatological data. J. Irrig. Drain. Eng. 2003, 129, 256–269. [Google Scholar] [CrossRef]
Wang, Y.; Jiang, B.; Liang, S.; Wang, D.; He, T.; Wang, Q.; Zhao, X.; Xu, J. Surface shortwave net radiation estimation from landsat tm/etm+ data using four machine learning algorithms. Remote Sens. 2019, 11, 2847. [Google Scholar] [CrossRef] [Green Version]
Guo, X.; Yao, Y.; Zhang, Y.; Lin, Y.; Jiang, B.; Jia, K.; Zhang, X.; Xie, X.; Zhang, L.; Shang, K.; et al. Discrepancies in the simulated global terrestrial latent heat flux from glass and merra-2 surface net radiation products. Remote Sens. 2020, 12, 2763. [Google Scholar] [CrossRef]
Brown, P.T.; Caldeira, K. Greater future global warming inferred from earth’s recent energy budget. Nature 2017, 552, 45–50. [Google Scholar] [CrossRef]
Yu, Y.; Shi, J.; Wang, T.; Letu, H.; Yuan, P.; Zhou, W.; Hu, L. Evaluation of the himawari-8 shortwave downward radiation (swdr) product and its comparison with the ceres-syn, merra-2, and era-interim datasets. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 519–532. [Google Scholar] [CrossRef]
Fu, Q.; Liou, K.N. Parameterization of the radiative properties of cirrus clouds. J. Atmos. Sci. 1993, 50, 2008–2025. [Google Scholar] [CrossRef] [Green Version]
Wielicki, B.A.; Barkstrom, B.R.; Harrison, E.F.; Lee, R.B.; Smith, G.L.; Cooper, J.E. Clouds and the earth’s radiant energy system (ceres): An earth observing system experiment. Bull. Am. Meteorol. Soc. 1996, 77, 853–868. [Google Scholar] [CrossRef] [Green Version]
Tarek, M.; Brissette, F.P.; Arsenault, R. Evaluation of the era5 reanalysis as a potential reference dataset for hydrological modelling over north america. Hydrol. Earth Syst. Sci. 2020, 24, 2527–2544. [Google Scholar] [CrossRef]
Babar, B.; Luppino, L.T.; Boström, T.; Anfinsen, S.N. Random forest regression for improved mapping of solar irradiance at high latitudes. Sol. Energy 2020, 198, 81–92. [Google Scholar] [CrossRef]
Carmona, F.; Rivas, R.; Caselles, V. Development of a general model to estimate the instantaneous, daily, and daytime net radiation with satellite data on clear-sky days. Remote Sens. Environ. 2015, 171, 1–13. [Google Scholar] [CrossRef]
Ying, W.; Wu, H.; Li, Z.-L. Net surface shortwave radiation retrieval using random forest method with modis/aqua data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 2252–2259. [Google Scholar] [CrossRef]
Cheng, J.; Liang, S. Global estimates for high-spatial-resolution clear-sky land surface upwelling longwave radiation from modis data. IEEE Trans. Geosci. Remote Sens. 2016, 54, 4115–4129. [Google Scholar] [CrossRef]
Zhou, W.; Shi, J.; Wang, T.; Peng, B.; Zhao, R.; Yu, Y. Clear-sky longwave downward radiation estimation by integrating modis data and ground-based measurements. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 450–459. [Google Scholar] [CrossRef]
Wenhui, W.; Shunlin, L.; Augustine, J.A. Estimating high spatial resolution clear-sky land surface upwelling longwave radiation from modis data. IEEE Trans. Geosci. Remote Sens. 2009, 47, 1559–1570. [Google Scholar] [CrossRef]
Xiong, X.; Angal, A.; Twedt, K.A.; Chen, H.; Link, D.; Geng, X.; Aldoretta, E.; Mu, Q. Modis reflective solar bands on-orbit calibration and performance. IEEE Trans. Geosci. Remote Sens. 2019, 57, 6355–6371. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
Yu, P.-S.; Yang, T.-C.; Chen, S.-Y.; Kuo, C.-M.; Tseng, H.-W. Comparison of random forests and support vector machine for real-time radar-derived rainfall forecasting. J. Hydrol. 2017, 552, 92–104. [Google Scholar] [CrossRef]
Gislason, P.O.; Benediktsson, J.A.; Sveinsson, J.R. Random forests for land cover classification. Pattern Recognit. Lett. 2006, 27, 294–300. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 2012, 12, 2825–2830. [Google Scholar]
Verma, M.; Fisher, J.; Mallick, K.; Ryu, Y.; Kobayashi, H.; Guillaume, A.; Moore, G.; Ramakrishnan, L.; Hendrix, V.; Wolf, S.; et al. Global surface net-radiation at 5 km from modis terra. Remote Sens. 2016, 8, 739. [Google Scholar] [CrossRef] [Green Version]
Doggett, L.; Kaplan, G. An almanac for computers. Bull. Am. Astron. Soc. 1976, 8, 299. [Google Scholar]
Zhou, Y.; Yan, G.; Zhao, J.; Chu, Q.; Liu, Y.; Yan, K.; Tong, Y.; Mu, X.; Xie, D.; Zhang, W. Estimation of daily average downward shortwave radiation over antarctica. Remote Sens. 2018, 10, 422. [Google Scholar] [CrossRef] [Green Version]
King, M.D.; Platnick, S.; Menzel, W.P.; Ackerman, S.A.; Hubanks, P.A. Spatial and temporal distribution of clouds observed by modis onboard the terra and aqua satellites. IEEE Trans. Geosci. Remote Sens. 2013, 51, 3826–3852. [Google Scholar] [CrossRef]
Bishop, C. Neural Networks for Pattern Recognition; Oxford University Press: Oxford, UK, 1995. [Google Scholar]
Kim, H.-Y.; Liang, S. Development of a hybrid method for estimating land surface shortwave net radiation from modis data. Remote Sens. Environ. 2010, 114, 2393–2402. [Google Scholar] [CrossRef]
Xu, J.; Jiang, B.; Liang, S.; Li, X.; Wang, Y.; Peng, J.; Chen, H.; Liang, H.; Li, S. Generating a high-resolution time-series ocean surface net radiation product by downscaling j-ofuro3. IEEE Trans. Geosci. Remote Sens. 2021, 59, 2794–2809. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Processing Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
Nair, V.; Hinton, G.E. In Rectified linear units improve restricted boltzmann machines. In Proceedings of the ICML, Haifa, Israel, 21–24 June 2010. [Google Scholar]
He, T.; Liang, S.; Wang, D.; Shi, Q.; Goulden, M.L. Estimation of high-resolution land surface net shortwave radiation from aviris data: Algorithm development and preliminary results. Remote Sens. Environ. 2015, 167, 20–30. [Google Scholar] [CrossRef] [Green Version]
Sianturi, Y.; Marjuki; Sartika, K. Evaluation of era5 and merra2 reanalyses to estimate solar irradiance using ground observations over indonesia region. In Proceedings of the International Energy Conference Astechnova 2019, Yogyakarta, Indonesia, 30–31 October 2019. [Google Scholar]
Jiang, H.; Lu, N.; Huang, G.; Yao, L.; Qin, J.; Liu, H. Spatial scale effects on retrieval accuracy of surface solar radiation using satellite data. Appl. Energy 2020, 270, 115178. [Google Scholar] [CrossRef]

Figure 1. Geographical distribution of the 340 sites with various land cover types and climate zones.

Figure 2. The proportion of sites within 25 km (a) with the probability of having the same land cover as the sites, (b) the variations in elevation measured in the standard deviation, and (c) the variations in vegetation coverage measured in the standard deviation in NDVI.

Figure 3. Spatial distribution of the MODIS overpassing frequency during daytime on the 240th day of 2005.

Figure 4. Flowchart of this study.

Figure 5. The

R^{2}

between the in situ

R_{n_i n s}

and

R_{n_d a i l y}

for each of the 24 h in one day at the mid-low latitudes in the Northern hemisphere for (a) latitudes from 0 to 60° (10° interval), (b) twelve months from January to December, and (c) sixteen land cover types.

Figure 5. The

R^{2}

between the in situ

R_{n_i n s}

and

R_{n_d a i l y}

for each of the 24 h in one day at the mid-low latitudes in the Northern hemisphere for (a) latitudes from 0 to 60° (10° interval), (b) twelve months from January to December, and (c) sixteen land cover types.

Figure 6. The distributions of the training samples classified by (a) clear and cloudy sky-condition at MODIS overpass time; (b) SZA; (c) VZA; and (d) RAA.

Figure 7. The overall results of the (a) training and (b) independent accuracy of validation of

R_{n_i n s}

from the RF-based ins model, and (c) the scatter plot between in situ

R_{n_d a y t i m e}

and

R_{n_d a i l y}

measurements. Note that the color bar represents data density.

Figure 7. The overall results of the (a) training and (b) independent accuracy of validation of

R_{n_i n s}

from the RF-based ins model, and (c) the scatter plot between in situ

R_{n_d a y t i m e}

and

R_{n_d a i l y}

measurements. Note that the color bar represents data density.

Figure 8. The overall results of the training accuracy (a,d,g,j), independent validation accuracy (b,e,h,k), and the validation accuracy against the common validation samples (c,f,i,l) for the RF-based ins model, RF-base model, RF-based model with ERA5, and LUT-based model (from top to bottom), respectively. Note that the color bar represents data density.

Figure 9. Spatial distribution of the validation results of the

R_{n_d a i l y}

estimations from the RF-based model with ERA5 at each site for (a) RMSE (Wm⁻²), (b) rRMSE, and (c) bias (Wm⁻²), respectively.

Figure 9. Spatial distribution of the validation results of the

R_{n_d a i l y}

estimations from the RF-based model with ERA5 at each site for (a) RMSE (Wm⁻²), (b) rRMSE, and (c) bias (Wm⁻²), respectively.

Figure 10. Comparison of the validation accuracy in

R_{n_d a i l y}

from (a) the RF-based model with ERA5 and four products (b) GLASS, (c) CERES4A, (d) ERA5 and (e) FLUXCOM_RS, respectively. The red line is 1:1 line. Note that the color bar represents data density.

Figure 10. Comparison of the validation accuracy in

R_{n_d a i l y}

from (a) the RF-based model with ERA5 and four products (b) GLASS, (c) CERES4A, (d) ERA5 and (e) FLUXCOM_RS, respectively. The red line is 1:1 line. Note that the color bar represents data density.

Figure 11. The performance of RF-based model with ERA5

R_{n}

in various land cover types (a) and elevation groups (b).

Figure 11. The performance of RF-based model with ERA5

R_{n}

in various land cover types (a) and elevation groups (b).

Figure 12. Time series of the

R_{n_d a i l y}

from the RF-based model with ERA5, GLASS, and in situ measurements at three sites: (a) SF_DRA (36.63°N, 116.02°W, BSV); (b) SF_FPK (48.31°N, 105.1°W, GRA); and (c) SF_SXF (43.73°N, 96.62°W, OSH).

Figure 12. Time series of the

R_{n_d a i l y}

from the RF-based model with ERA5, GLASS, and in situ measurements at three sites: (a) SF_DRA (36.63°N, 116.02°W, BSV); (b) SF_FPK (48.31°N, 105.1°W, GRA); and (c) SF_SXF (43.73°N, 96.62°W, OSH).

Figure 13. (a) The spatial explicit of global

R_{n_d a i l y}

on day of 220, 2017 at mid-low latitudes, and a comparison of the magnified images within the black box in (a) for (b) RF-based model with ERA5, (c) GLASS, (d) CERES4A, and (e) ERA5.

Figure 13. (a) The spatial explicit of global

R_{n_d a i l y}

on day of 220, 2017 at mid-low latitudes, and a comparison of the magnified images within the black box in (a) for (b) RF-based model with ERA5, (c) GLASS, (d) CERES4A, and (e) ERA5.

Figure 14. The performance of the RF-based model with ERA5 varied with different overpassing time and latitude in (a) RMSE (Wm⁻²), (b) bias (Wm⁻²), (c) R² and (d) rRMSE.

Figure 15. The validation results in

R_{n_d a i l y}

from RF-based model with ERA5 only using the MODIS TOA data from (a) Terra and (b) Aqua. Note that the color bar represents data density.

Figure 15. The validation results in

R_{n_d a i l y}

from RF-based model with ERA5 only using the MODIS TOA data from (a) Terra and (b) Aqua. Note that the color bar represents data density.

Figure 16. The comparison in validation accuracy represented by rRMSE between the

R_{n_d a i l y}

estimated from the RF-based model (blue line) and RF-based model with ERA5 (red line) under various daily average atmospheric conditions, represented by CI.

Figure 16. The comparison in validation accuracy represented by rRMSE between the

R_{n_d a i l y}

estimated from the RF-based model (blue line) and RF-based model with ERA5 (red line) under various daily average atmospheric conditions, represented by CI.

Figure 17. The difference in the

R_{n_d a i l y}

validated rRMSE of the RF-base model with or without the ERA5

R_{n_d a i l y}

by taking different numbers of the MODIS TOA observations as input under different sky conditions, represented by CI. The difference of rRMSE was calculated by the rRMSE of the RF-base model with ERA5 minus that of the RF-base model. The legend is the number of the MODIS TOA observations used in one day.

Figure 17. The difference in the

R_{n_d a i l y}

validated rRMSE of the RF-base model with or without the ERA5

R_{n_d a i l y}

by taking different numbers of the MODIS TOA observations as input under different sky conditions, represented by CI. The difference of rRMSE was calculated by the rRMSE of the RF-base model with ERA5 minus that of the RF-base model. The legend is the number of the MODIS TOA observations used in one day.

Table 1. Information about the thirteen measuring networks.

Abbreviation	No. of Sites	Time Span	Instrument	Temporal Resolution	URL
ARM ¹	33	2001–2017	Kipp&Zonen Pyrgeometer	1 min	[30]
AsiaFlux	26	2001–2015	Kipp&Zonen, CM-6F	30 min	[7]
BSRN ²	7	2001–2017	Eppley, PIR/Kipp&Zonen CG4	1 or 3min	[31]
CEOP ³	8	2008–2009	Eppley PIR, CG4	30 min	[32]
CEOP_Int	5	2002–2019	QMN101	30 min	\
CERN ⁴	1	2007–2014	-	30 min	[33]
ChinaFlux	3	2003–2016	-	30 min	\
GAME.ANN	2	2001–2003	EKO MS0202F	30 min	\
HiWATER	16	2012–2012	CNR-4	10 min	[34]
LaThuile ⁵	227	2001–2017	Kipp&ZonenCNR- 1,etc	30 min	[35]
LBA-ECO ⁶	4	2001–2006	REBS Q*7.1	1 h	[36]
SAFARI ⁷	1	2001–2017	Kipp&Zonen Pyrgeometer	30 min	[37]
SURFRAD	7	2001–2017	Eppley, PIR	3 min	[38]

¹ ARM: Atmospheric Radiation Measurement [30], ² BSRN: Baseline Surface Radiation Network [31], ³ CEOP: Coordinated Enhanced Observation Network of China [32], ⁴ CERN: Chinese ecosystem research network [33], ⁵ LaThuile: Global Fluxnet (LaThuile dataset) [35], ⁶ LBA-ECO (Large Scale Biosphere-Atmosphere Experiment) [36], ⁷ SAFARI: Southern African Regional Science Initiative Project [37].

Table 2. The MODIS products used in this study.

MODIS Product	Temporal Resolution	Spatial Resolution	Parameters Used
MOD/MYD02	5 min	1 km	1 km_RefSB,1 km_Emissive
MOD/MYD03	5 min	1 km	SolarZenith(SZA),SolarAzimuth (SAA), SensorZenith(VZA),SensorAzimuth (VAA), Height
MOD/MYD35	5 min	1 km	Cloud Mask

Table 3. Detailed information on the four proposed models.

Model	Description	No. of Training Samples	No. of Validation Samples	No. of Inter-Comparison Samples
RF-based ins model	$To estimate R_{n_i n s}$ $from MODIS TOA observations, and then convert it to R_{n_d a i l y}$ with Sine model	95,026	23,826	23,826
RF-based model	$To estimate R_{n_d a i l y}$ from MODIS TOA observations	452,098	112,869	23,826
RF-based model with ERA5	$To estimate R_{n_d a i l y}$ $from MODIS TOA observations by introducing the R_{n_d a i l y}$ from ERA5	452,098	112,869	23,826
LUT-based model	$To estimate R_{n_d a i l y}$ from MODIS TOA observations with different conditional models for various conditions	452,098	112,869	23,826

Table 4. Range of values of the Hyper-parameters setting in RF.

Hyper-Parameter	Threshold	Intervals
n-estimators	30–100	10
max depth	2–15	1
min samples split	2–10	1
min samples leaf	2–10	1

Table 5. Specification of the viewing geometry bin and TOA sky condition.

View Geometry/Sky Condition	Values
SZA	$20^{°}, 30^{°}, 40^{°}, 50^{°}, 60^{°}, 70^{°}, 85^{°}$
VZA	$10^{°}, 20^{°}, 30^{°}, 40^{°}, 55^{°}, 70^{°}$
RAA	$30^{°}, 60^{°}, 90^{°}, 120^{°}, 180^{°}$
Cloud mask	Clear, cloudy

Table 6. The validation results for various models built based on the RF-based model.

Input	RMSE (Wm⁻²)	Bias (Wm⁻²)
Original	22.87	0.25
$Original + C_{d}$	22.63	0.33
Original + T + SP + CC + TCL + RH + TCW	22.60	0.25
$Original + C_{d}$ + T + SP + CC + TCL + RH + TCW	22.40	0.34
$Original + C I_{E R A 5}$	22.39	0.20
$Original + C I_{E R A 5}$ + T + SP + CC + TCL + RH + TCW	22.27	0.23
$Original + R_{n}$ (ERA5)	21.83	0.20

Table 7. The validation accuracy of the final

R_{n_d a i l y}

from the RF-based model and RF-based model with ERA5 with different numbers of MODIS TOA data as inputs.

Table 7. The validation accuracy of the final

R_{n_d a i l y}

from the RF-based model and RF-based model with ERA5 with different numbers of MODIS TOA data as inputs.

Overpass Times	Bias (Wm⁻²)		RMSE (Wm⁻²)
Overpass Times	RF-Based Model	RF-Based Model with ERA5	RF-Based Model	RF-Based Model with ERA5
One	0.8	0.68	24.17	22.41
Two	0.01	−0.13	22	21.43
Three	-0.66	−0.29	21.92	21.55
Average	0.25	0.2	22.87	21.83

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, S.; Jiang, B.; Peng, J.; Liang, H.; Han, J.; Yao, Y.; Zhang, X.; Cheng, J.; Zhao, X.; Liu, Q.; et al. Estimation of the All-Wave All-Sky Land Surface Daily Net Radiation at Mid-Low Latitudes from MODIS Data Based on ERA5 Constraints. Remote Sens. 2022, 14, 33. https://doi.org/10.3390/rs14010033

AMA Style

Li S, Jiang B, Peng J, Liang H, Han J, Yao Y, Zhang X, Cheng J, Zhao X, Liu Q, et al. Estimation of the All-Wave All-Sky Land Surface Daily Net Radiation at Mid-Low Latitudes from MODIS Data Based on ERA5 Constraints. Remote Sensing. 2022; 14(1):33. https://doi.org/10.3390/rs14010033

Chicago/Turabian Style

Li, Shaopeng, Bo Jiang, Jianghai Peng, Hui Liang, Jiakun Han, Yunjun Yao, Xiaotong Zhang, Jie Cheng, Xiang Zhao, Qiang Liu, and et al. 2022. "Estimation of the All-Wave All-Sky Land Surface Daily Net Radiation at Mid-Low Latitudes from MODIS Data Based on ERA5 Constraints" Remote Sensing 14, no. 1: 33. https://doi.org/10.3390/rs14010033

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Estimation of the All-Wave All-Sky Land Surface Daily Net Radiation at Mid-Low Latitudes from MODIS Data Based on ERA5 Constraints

Abstract

1. Introduction

2. Data and Pre-Processing

2.1. Ground Measurements

2.1.1. Radiation Measurements

2.1.2. Clearness Index Calculation

2.2. Remotely Sensed Products

2.2.1. MODIS Products for Modeling

2.2.2. Products Providing $R_{n_d a i l y}$ for Comparison

2.3. ERA5 Reanalysis Products

3. Methodology

3.1. $R_{n_d a i l y}$ Estimation Model Development

3.1.1. MODIS TOA Data Selection for Modeling

3.1.2. Modeling with Random Forest

3.1.3. Look-Up Table (LUT-Based) Model

3.2. Model Performance Evaluation

4. Results and Analysis

4.1. Proposed Model Performance Evaluation

4.2. Further Analysis with RF-Based Model with ERA5

4.2.1. Comparison with Other Products at the Site Scale

4.2.2. $R_{n_d a i l y}$ Mapping Using the RF-Based Model with ERA5

5. Discussion of the RF-Based Model with ERA5

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Estimation of the All-Wave All-Sky Land Surface Daily Net Radiation at Mid-Low Latitudes from MODIS Data Based on ERA5 Constraints

Abstract

1. Introduction

2. Data and Pre-Processing

2.1. Ground Measurements

2.1.1. Radiation Measurements

2.1.2. Clearness Index Calculation

2.2. Remotely Sensed Products

2.2.1. MODIS Products for Modeling

2.2.2. Products Providing R n _ d a i l y for Comparison

2.3. ERA5 Reanalysis Products

3. Methodology

3.1. R n _ d a i l y Estimation Model Development

3.1.1. MODIS TOA Data Selection for Modeling

3.1.2. Modeling with Random Forest

3.1.3. Look-Up Table (LUT-Based) Model

3.2. Model Performance Evaluation

4. Results and Analysis

4.1. Proposed Model Performance Evaluation

4.2. Further Analysis with RF-Based Model with ERA5

4.2.1. Comparison with Other Products at the Site Scale

4.2.2. R n _ d a i l y Mapping Using the RF-Based Model with ERA5

5. Discussion of the RF-Based Model with ERA5

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2.2.2. Products Providing $R_{n_d a i l y}$ for Comparison

3.1. $R_{n_d a i l y}$ Estimation Model Development

4.2.2. $R_{n_d a i l y}$ Mapping Using the RF-Based Model with ERA5