A 1 km Global Carbon Flux Dataset Using In Situ Measurements and Deep Learning

Shangguan, Wei; Xiong, Zili; Nourani, Vahid; Li, Qingliang; Lu, Xingjie; Li, Lu; Huang, Feini; Zhang, Ye; Sun, Wenye; Dai, Yongjiu

doi:10.3390/f14050913

Open AccessArticle

A 1 km Global Carbon Flux Dataset Using In Situ Measurements and Deep Learning

by

Wei Shangguan

^1,*

,

Zili Xiong

¹

,

Vahid Nourani

^2,3

,

Qingliang Li

⁴

,

Xingjie Lu

¹

,

Lu Li

¹,

Feini Huang

¹

,

Ye Zhang

¹,

Wenye Sun

¹ and

Yongjiu Dai

¹

Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Guangdong Province Key Laboratory for Climate Change and Natural Disaster Studies, School of Atmospheric Sciences, Sun Yat-sen University, Guangzhou 510275, China

²

Center of Excellence in Hydroinformatics, Faculty of Civil Engineering, University of Tabriz, Tabriz 51368, Iran

³

Faculty of Civil and Environmental Engineering, Near East University, Via Mersin 10, Nicosia 99628, Turkey

⁴

College of Computer Science and Technology, Changchun Normal University, Changchun 130032, China

^*

Author to whom correspondence should be addressed.

Forests 2023, 14(5), 913; https://doi.org/10.3390/f14050913

Submission received: 10 February 2023 / Revised: 21 April 2023 / Accepted: 27 April 2023 / Published: 28 April 2023

(This article belongs to the Section Forest Inventory, Modeling and Remote Sensing)

Download

Browse Figures

Versions Notes

Abstract

:

Global carbon fluxes describe the carbon exchange between land and atmosphere. However, already available global carbon fluxes datasets have not been adjusted by the available site data and deep learning tools. In this work, a global carbon fluxes dataset (named as GCFD) of gross primary productivity (GPP), terrestrial ecosystem respiration (RECO), and net ecosystem exchange (NEE) has been developed via a deep learning based convolutional neural network (CNN) model. The dataset has a spatial resolution of 1 km at three time steps per month from January 1999 to June 2020. Flux measurements were used as a training target while remote sensing of vegetation conditions and meteorological data were used as predictors. The results showed that CNN could outperform other commonly used machine learning methods such as random forest (RF) and artificial neural network (ANN) by leading to satisfactory performance with R² values of the validation stage as 0.82, 0.72 and 0.62 for GPP, RECO, and NEE modelling, respectively. Thus, CNN trained using reanalysis meteorological data and remote sensing data was chosen to produce the global dataset. GCFD showed higher accuracy and more spatial details than some other global carbon flux datasets with reasonable spatial pattern and temporal variation. GCFD is also in accordance with vegetation conditions detected by remote sensing. Owing to the obtained results, GCFD can be a useful reference for various meteorological and ecological analyses and modelling, especially when high resolution carbon flux maps are required.

Keywords:

carbon flux; machine learning; gross primary production; terrestrial ecosystem respiration; net ecosystem exchange

1. Introduction

Carbon dioxide in the atmosphere is an important factor influencing climate [1,2]. Carbon dioxide exchanged between the atmosphere and land, commonly referred to as carbon flux, is a major contributor to global climate change and warming [3,4]. Gross primary production (GPP), terrestrial ecosystem respiration (RECO), and net ecosystem exchange (NEE) are the most common carbon flux variables used to measure the carbon exchange process between the land and the atmosphere [5,6]. Continuous measurements of these data are provided by eddy covariance flux observation towers around the world [7]. FLUXNET is an example of the global observation network [8]. However, despite the large number of sites, the measured values at these sites can only represent the regional carbon fluxes in the absence of global distribution information [9,10,11,12]. Thus, it is necessary to estimate spatial–temporal continuous distribution of global carbon flux.

In order to make effective estimates of carbon flux in various regions of the earth, reliable upscaling methods should be adopted and applied [13]. There are several kinds of technologies available to calculate or estimate global carbon fluxes [14]. Data-driven machine learning methods are crucial in these approaches. These methods typically make use of satellite remote sensing records, meteorological data, and other datasets related to the carbon flux. Several studies and projects used various machine learning models to predict carbon fluxes. The most widely used of such projects is FLUXCOM (http://www.fluxcom.org/, accessed on 31 January 2023) [15]. The model tree ensemble method employing remote sensing data and a meteorological dataset was the benchmark approach for FLUXNET upscaling [16]. The datasets obtained can be used to assess land surface process models and the state of the biosphere [17]. The recent initiative further developed models based on machine learning algorithms in carbon flux prediction. Tramontana et al. [18] evaluated the random forest model’s performance in various regions for various target variables. Their study used 11 machine learning methods in four categories, including tree method, kernel function method, neural network, and regression splines, and compared the outputs with results of the existing models [9,19]. Using the same models, Jun et al. [20] predicted the global gridded data of energy flux, including radiation, latent heat, and sensible heat. Other studies also used some machine learning methods for carbon flux estimation. Among these methods, random forest and some other similar tree models showed acceptable outcomes for predicting carbon flux. The prediction results using random forest showed that it outperformed MODIS products in several locations [21]. Xiao et al. [22] used a modified regression tree to predict the net ecosystem carbon exchange in the US. Cubist regression tree model was used to estimate global SIF (solar-induced chlorophyll fluorescence) and GPP based on MODIS remote sensing data and meteorological reanalysis data [23]. The tree ensemble method was applied to develop a dataset nationwide over China [24]. Zeng et al. [25] used the random forest model to upscale GPP, RECO, and NEE data of FLUXNET2015 to obtain global carbon flux products from 1999 to 2019. In addition to tree models, the Neural Network model has also been used to estimate global GPP with remotely sensed SIF and other radiation and meteorological data as predictors [26]. The Support Vector Regression model could also reproduce GPP and NEE at site level and spatial level in Asia [27]. The potential for deep learning applications in environmental modelling is growing [28]. For example, Yu et al. used four deep learning models, including deep neural network, convolutional neural network, back propagation neural network, and recurrent neural network to downscale GLASS GPP and NPP data to a higher resolution [29].

Nowadays, there are still some research gaps regarding models and data utilized for prediction of carbon fluxes. First, few studies have paid attention to the application of deep learning on upscaling one carbon flux in order to derive global products, despite the fact that they outperform conventional machine learning in many cases, though many studies have focused on estimating carbon flux by conventional machine learning methods. Therefore, in our research, the convolutional neural network (CNN), as a robust deep learning model, was applied for prediction of the carbon flux in order to make a comparison with other conventional machine learning methods. Second, available site data are not fully utilized in previous studies [9,19,25]. In our study, in addition to the FLUXNET dataset, two more site datasets (i.e., FLUXNET-CH4 [30] and Drought-2018 [31]) with updated data were added to the training target set, which can enhance the representativeness of the site data, and thus lead to a higher quality of the carbon flux estimation. Last but not least, current global carbon flux products are at relative coarse resolutions with limited precision. The widely used FLUXCOM has a resolution of 0.5 degree, while Zeng et al. [25] provides a product at 0.1 degree. In our study, we aim to produce a global carbon flux dataset at a much higher resolution, i.e., 30 arc second (~1 km), which can meet the demand of research that requires more spatial details of carbon fluxes. The purpose of this study was to develop a global carbon flux dataset (GCFD) with 1 km resolution using remote sensing and meteorological variables. The main research questions were as follows: (1) Can CNN as a deep learning model achieve better performance than conventional machine learning models such as ANN and RF? (2) Are the spatial patterns and temporal variations of carbon fluxes estimated by CNN reasonable at the resolution of 1 km?

The remaining structure of the paper is organized as follows. Section 2 describes the target variables and predictors, models, experiments, model evaluation, and data production. Section 3 gives the validation results, timeseries diagrams, spatial distributions, and variable relation analysis. Section 4 and Section 5 present the discussion and conclusions, respectively.

2. Materials and Methods

This section introduces data sources used, models for predicting carbon flux, experiment design, and evaluation indicators. The framework of this study is shown in Figure 1.

2.1. Target Carbon Fluxes and Predictors

The predicted target variables for the models were three carbon fluxes, namely, GPP, RECO, and NEE from flux tower sites. There are mainly two categories of predictors used in this research: remote sensing and meteorological variables. Remote sensing variables include fraction of absorbed photosynthetically active radiation (FAPAR), leaf area index (LAI), minimum LAI of each year (LAI_MIN), maximum LAI of each year (LAI_MAX), and the number of LAI exceeding the average of LAI_MAX and LAI_MIN in each year (LAI_COUNT). The meteorological variables include 2 m temperature (TA), surface solar radiation downward (SW_IN), latent heat flux (LE), sensible heat flux (H), soil temperature (TS) and soil water content (SWC). These predictors were selected for the current modelling according to previous studies [25]. The remote sensing variables were used as indicators of vegetation conditions. Zeng et al. proved that there is a good statistical relationship between carbon fluxes and FAPAR (or LAI) [25]. In addition, LAI_MIN, LAI_MAX, and LAI_COUNT can also represent different plant functional types to some extent. The meteorological variables were used as environmental condition indicators that affect plants’ activities.

2.1.1. Flux Tower Site Data

Three datasets, FLUXNET2015 [8], FLUXNET-CH4 [30], and Drought-2018 [31] were used as site datasets in this study, which include meteorological data and carbon flux variables. The FLUXNET2015 Tier 2, the FLUXNET-CH4, and Drought-2018 datasets contain data from 212, 80, and 52 sites, respectively. Since the FLUXNET-CH4 and Drought-2018 datasets have been produced more recently, only the new versions of data were kept for duplicated sites from different datasets. After removing duplicated sites, 280 sites around the world remained. The locations of these sites are shown in Figure 1. In this study, daily values of three carbon flux variables were extracted from these sites. The time coverages of these sites are different, varying from 1996 to 2018. Data after 1999 were used in the study to keep consistent with the time range of available remote sensing data. In order to keep consistent with the time interval of remote sensing data, site data were aggregated into a 10-day time step. Since GPP and RECO are always positive, we removed the negative values of GPP and RECO. We also removed the unrealistic continuous constant values of site measurements at some timestamps. Missing values were deleted for the machine learning modelling.

2.1.2. Remote Sensing Data

The remote sensing data including FAPAR and LAI are from Copernicus Global Land Service (CGLS) version 2 dataset [32] with 1 km resolution, covering from January 1999 to June 2020. The interval of the data is 10 days: the first 10 days of a month, the 11th to 20th days of the month, and after the 20th day of the month. The LAI_MAX, LAI_MIN, and LAI_COUNT variables were calculated according to LAI, and they were annual scale. For every year in our 10-day fitting, we applied the same values of these variables. Remote sensing data were extracted according to the location of each flux site. That is, if a site is located in a grid of remote sensing data, the corresponding values are extracted.

2.1.3. Reanalysis Data

The reanalysis data for the meteorological variables were retrieved from the ERA5-Land dataset (Land component of the fifth generation of European Reanalysis) with the spatial resolution of 0.1° × 0.1° [33]. Data from January 1999 to June 2020 were used in this study. In accordance with remote sensing data, the original data with a three-hour time step were aggregated to the 10-day average. We processed the ERA5 data into 1 km resolution using the nearest neighbor method.

2.2. Models for Carbon Flux Prediction

The CNN model [34] was used in this study as a robust deep learning approach to predict carbon fluxes. For comparison, two conventional machine learning models were also applied, including ANN [35] and RF [36]. The main components of CNN include convolution layer and pooling layer. The convolution layer projects the features of the input data to the next layer through convolution operation, while the pooling layer reduces the number of parameters and computation by down sampling the input data. After the above processing, the data are processed and outflow through the fully connected layers. The basic structure of ANN is a perceptron, which is a kind of artificial neuron. It first accepts input and calculates weighted sum, and then produces output using activation functions. A neural network consists of several layers of perceptron, including input layer, fully connected layer, and output layer. In regression problems, the number of neurons in the last output layer of ANN and CNN is the number of output dimensions. In our study, three neurons were considered in the output layer to predict the values of GPP, RECO, and NEE. RF is an ensemble of multiple decision trees. It uses the bagging method to randomly select a feature subset from all available features to train different decision trees at each time (iteration), and it finally takes the average of the predicted results of all the decision trees as the final predicted value.

2.3. Experiment Design

Experiments were designed for comparison among three machine learning algorithms, including CNN, ANN, and RF. The grid search method was used to determine the optimum parameters for each model. We specified the search range of each parameter based on the default setting of the software and some previous studies about machine learning. For the random forest model, we considered a search range of total tree numbers within 100 to 900 with an interval of 100, and we searched for minimum samples at the leaf node from 1 to 10. According to the above searching scheme, we set the total tree numbers and the minimum samples at a leaf node as 500 and 5, respectively. For ANN and CNN models, the searching range of the number of neurons of each dense layer was from 10 to 90 with an interval of 10. The number of dense layers was searched in the range of 1 to 5. The learning rate was searched among 0.1, 0.01, and 0.001. For the ANN model, the number of dense layers, the number of neurons, and the learning rate were set as 3, 90, and 0.01, respectively. For CNN modelling, a 1-dimensional convolutional layer followed by 3 dense layers with 70 neurons and the learning rate of 0.01 were used. The simulation setting of the parameters in this part was described in Table 1.

2.4. Model Evaluation and Data Production

The modelling performance was evaluated by the year-to-year and station-to-station validations. In the year-to-year validation, the data from 1999 to 2017 in each station were used as a training dataset, and to evaluate the accuracy of the models on a temporal scale, the generated carbon fluxes by three models were compared with the observed carbon flux towers data in 2018. In the station-to-station validation, randomly selected data from 90% of the stations from 1999 to 2018 were applied for training, and the data from the rest stations were used to evaluate accuracy of the three models on a spatial scale. This process was repeated 10 times in order to make use of every station to evaluate the models. Finally, we computed the averaged results of the above 10 evaluations.

In this study, the coefficient of determination (R²), root mean square error (RMSE), and bias were used to evaluate modelling performance as:

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}},

(1)

R M S E = \sqrt{\frac{\sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}}{n}},

(2)

b i a s = \hat{y_{i}} - y_{i}

(3)

where

y_{i}

and

\hat{y_{i}}

are the observed and predicted values, respectively, and

\bar{y}

is the averaged value of observations.

Due to the evaluation of three models (see Section 3.1 and Section 3.2), the CNN model was chosen to produce the global dataset (GCFD). GCFD is provided with two resolutions, i.e., 30 arc second (~1 km) and 0.1 degree (~9 km) 3 times per month, covering January 1999 to June 2020, which is in accordance with the CGLS version 2 dataset used as predictors. The three carbon fluxes were set to zero for the LAI values less than 0.1.

3. Results

3.1. Station-to-Station and Year-to-Year Cross Validation Results

In order to compare different machine learning models, three ML models (i.e., RF, ANN and CNN) were used to carry out the validation.

Figure 2 shows the station-to-station cross-validation results of each model. Two interesting aspects could be observed according to the obtained results. First, there were significant differences among the performance of the models. CNN led to the best performance for all three carbon fluxes, while RF had the worst. Meanwhile, ANN was slightly worse than CNN. Second, according to the obtained R², all models generally achieved the best performance in predicting GPP, followed by RECO and NEE. Though NEE had slightly higher RMSE than RECO for ANN and CNN, GPP generally obtained the largest RMSE in both experiments, followed by RECO and NEE. This could be due to the wider range of GPP data compared to the RECO and NEE.

Furthermore, Figure 3 shows the scatter plots of the station-to-station cross-validation for the best model, i.e., CNN. CNN led to R² values of 0.82, 0.72, and 0.62. However, GPP had a larger RMSE (1.54 g C m⁻² d⁻¹) than RECO (1.27 g C m⁻² d⁻¹) and NEE (1.31 g C m⁻² d⁻¹). In the experiments, the scattered points were close to the 1:1 line and no significant underestimation or overestimation was observed.

As the time range of the used data for training was from 1999 to 2017, the data in 2018 were left out for the year-to-year validation purpose. The validation results are shown in Figure 4. The two interesting aspects could be highlighted in station-to-station validation, as also observed in the year-to-year validation. However, models generally had worse performance in the year-to-year validation than those in the station-to-station one, especially for the prediction of NEE by RF and ANN. This indicated that the predictability at the space scale was better than that at the time scale, especially for NEE. All models had similar RMSE around 1.9 and 2.3 g C m⁻² d⁻¹. Nevertheless, according to R², CNN could still be considered as the best model in general for all three fluxes. In summary, CNN outperformed the other two machine learning models according to the results of the two validation scenarios.

3.2. Prediction of Time Series at Single Sites

To further compare the prediction results, the trained models were used to make prediction at each site using the remote sensing records and reanalysis meteorological data for the location of the site. After prediction, the bias between predictions and observations was calculated (Figure 5). The median biases of all three carbon fluxes were close to zero in the experiments, which means that the models remained unbiased during the prediction.

In order to evaluate the modelling performance, three ML models (RF, ANN, and CNN) were used to predict the time series of carbon fluxes for four selected sites in different climate zones. Figure 6 shows the time series of GPP of these sites, including observed and predicted values by the models. CNN could lead to the best predictions at all sites, except that it had slightly lower R² (0.78) at NL-Loo than RF (0.83). All three models showed a similar trend in the predicted time series, but CNN tends to be closer to the observations. For the low values, all three models could make reliable estimates in most cases. However, for some peaks, RF and ANN showed significant overestimates, especially for ANN at FI-Hyy and CH-Dav (see red circles in Figure 6).

3.3. Multi-Site Averaged Time Series

In order to further explore the temporal variation of each carbon flux, data from all stations were analysed together. Here, predictions from all sites at each timestamp in GCFD were averaged for the period of 1999 to 2018. Figure 7a shows the annual average changes predicted by the proposed model and the observed values. Moreover, according to the Koppen Climate Classification [37], the annual variation of the carbon fluxes in different climate zones is shown in Figure 7b–h. The predictions of annual changes were quite consistent with the observation from carbon flux towers in all climate zones. It shows that the variations of GPP and RECO were similar, which coincides with recent studies [38]. Meanwhile, the variations of NEE were relatively small in all climate zones, except in tropical and subtropical climate. GPP showed a significant increase in the dry climate zone in the last two decades, which can be due, in the main, to the CO₂ fertilization effect, as reported by previous studies (Gonsamo et al., 2021). On the other hand, GPP has been decreasing in tropical zones since 2005. This possibly happened due to a higher vapour pressure deficit in the warmer climate over the tropics, which could lead to a suppression of photosynthesis [39]. The comparison among different global carbon fluxes is discussed in Section 4.2.

3.4. Spatial Patterns

Figure 8 shows the annual mean values of GPP, RECO, and NEE from 1999 to June 2020 in the GCFD. Generally, the spatial patterns of the three carbon fluxes were reasonable according to the physical knowledge about the process. As can be seen from Figure 8, GPP predicted by the CNN model showed relatively high values near the equator (mainly rainforest). There were also relatively high values in some places such as central America and the southern part of Asia. This pattern of GPP distribution is consistent with other studies [14,40]. According to the Koppen classification, most areas with high GPP have an equatorial fully humid or monsoonal climate. These kinds of climate provide adequate conditions of temperature and humidity for photosynthesis, so the GPP values are higher than other areas out of the tropical climates [41]. There is usually a positive correlation between GPP and temperature and precipitation parameters [42]. Recent studies have also revealed that temperature is the dominant factor affecting GPP at a global scale, while water availability is more important at the local scale, especially at low latitudes [43]. Therefore, in some humid areas of lower latitudes such as the southeast part of China, eastern Australia, the southern part of South America, and the southern US, there were also high GPP values, as shown by Figure 8. The spatial pattern of RECO was generally consistent with that of GPP. That is, higher GPP corresponds to RECO, which is consistent with studies that indicated that high RECO values usually appear with high temperature [44,45]. As recent studies have showed, RECO had the highest values in the tropical area, followed by southeast China, southeast US, and western Europe, and low values in cold and dry areas at high latitude [46,47]. Regarding NEE, the high values (carbon source) appeared in the high latitudes, while low values (carbon sink) appeared in the tropical rainforest regions. This is consistent with the previous studies showing that NEE is strongly related to air temperature [48]. Recent studies also pointed to low NEE values in Asia and the tropics [49]. In cold and arid areas, previous studies reported that carbon usually releases into the atmosphere [50].

4. Discussion

4.1. Main Variables Affecting Carbon Fluxes

Through correlation analysis, major factors affecting carbon fluxes could be determined among the existing predictors. Figure S1 shows the correlation coefficient matrix, which calculates all correlation coefficients among the 11 predictors and 3 target variables in this study. As can be seen from Figure S1, FAPAR, LAI, LE, and TA had the greatest influence on the three carbon flux variables, where the absolute values of correlation coefficients between these predictors and GPP (or RECO) were all above 0.5. The high correlation of remote sensing predictors and carbon fluxes is consistent with previous studies [51,52,53].

In order to further analyse the influence of predictors on the carbon flux variables, the feature importance of random forest was used to determine and rank the important variables. Figure 9 shows the importance of each predictor by RF in Experiment 2, in which the sum of the importance values of all predictors is one. The result shows that LAI, FAPAR, and LE had larger feature importance than other variables when predicting GPP and RECO, while the five remote sensing predictors were the most important in predicting NEE, which is consistent with the result of correlation coefficients. However, air temperature was not important in the RF models compared to the correlation analysis result. This may be due to that the RF importance can reflect a nonlinear relationship, while the correlation coefficient measures only linear relationship between parameters.

4.2. Comparison with Different Global Carbon Flux Products

Here, the GCFD is compared with the predictions provided by Zeng et al. [24] (hereafter referred to as Zeng et al.) and FLUXCOM [9,19,25]. Because FLUXCOM data have been provided at monthly scale, all three data sets were processed into a monthly scale for comparison. Using data from all sites and their corresponding grid data for comparison, the provided results in Figure S2 were obtained. It shows that GCFD led to the best performance, but significant underestimations and overestimations were observed for the other two datasets. FLUXCOM and Zeng et al. showed quite similar levels of performance according to R² and RMSE values, while GCFD got higher R² and lower RMSE values for three fluxes. For example, compared with FLUXCOM, GCFD improved the R² by 58%, 66%, and 168% for GPP, RECO, and NEE, respectively. The above validation may not be so fair because FLUXCOM and Zeng et al. [24] only used FLUXNET2015, while GCFD used two more site datasets, i.e., FLUXNET-CH4 and Drought-2018 Flux. So, further independent comparison should be done by comparing the two datasets with the results of station-to-station and year-to-year cross validations of CNN in Experiment 2. According to the station-to-station cross validation, CNN in Experiment 2 still showed higher R² and lower RMSE values than the other two datasets. For example, compared with FLUXCOM, CNN improved the R² by 47%, 40%, and 100% for GPP, RECO, and NEE, respectively. According to the year-to-year cross validation, CNN led to higher R² for GPP (improved by 30%), RECO (improved by 1%), and NEE (improved by 42%), and similar RMSE with the other two datasets (Figure S3). The above results imply that GCFD could lead to higher accuracy than FLUXCOM and Zeng et al. [24], though it may have drawbacks in predicting the unseen period by the CNN model. The extrapolation for the unseen period by the model may produce poorer results compared with the interpolation for the unseen locations. Thus, it should be noted that the data in GCFD from 2019 to 2020 may have lower quality than those from 1999 to 2018 because in situ observations from 1999 to 2018 were used to derive the GCFD.

The global distribution of GPP, RECO, and NEE as well as their distribution in an area with oceanic climate from the three products on 10 July 2010 are presented in Figure 10, Figures S4 and S5. As can be seen, the GCFD with a spatial resolution of 1 km has more detailed information than other products, and it captures the spatial contiguity of carbon fluxes well. Meanwhile, the spatial patterns of GPP and RECO are consistent with the distribution of the most two important predictors with high resolution (LAI and FAPAR). The spatial distribution of the three carbon fluxes in continental climate zones, dry climate zones, and subtropical climate zones are also presented to further illustrate the advantages of the GCFD (Figures S6–S14). In summary, GCFD can better reflect the detailed spatial distribution of global carbon fluxes in accordance with vegetation conditions than Zeng et al. and FLUXCOM.

As shown in Figure 7, GCFD could better represent the annual changes of global carbon fluxes than Zeng et al. and FLUXCOM. For all three carbon fluxes, FLUXCOM had little annual change in all climate zones, which is a major defect of this product. Though Zeng et al. could reflect annual changes to some extent, it failed to reflect the increasing trend of GPP and RECO in dry climate zones and the decreasing trend in tropical climate zones, but the GCFD did. For the global average of all sites, GCFD had a slight overestimation of GPP and RECO compared with the observations, while Zeng et al. had a slight underestimation. However, FLUXCOM showed significant underestimation. This may be due to the resolution discrepancy of these products; because grids with larger size tend to have lower values of GPP and NEE in reflecting the average state. This phenomenon can also be observed in the regional maps of carbon fluxes (Figure 10 and Figures S4–S14). This further emphasizes the necessity for deriving a high-resolution carbon flux dataset for appropriately reflecting the spatial heterogeneity.

4.3. Uneven Distribution of Sites

This study used data from 280 sites around the world. As shown in Figure 1, it is easy to see that in Europe and North America, the number and density of sites are significantly higher than in other parts of the world. Therefore, a high proportion of data from Europe and North America were used in the training step of the modelling. This could lead to better predictions in Europe and North America, but greater errors in other regions. Recent reviews of carbon flux products also pointed out that the density of sites is higher in temperate area than in tropical area [14]. Therefore, it is necessary to find a better way to solve the problem of uneven distribution of sites, especially making more flux data available for the areas with sparse sites.

4.4. Selection of Predictors and Models

To predict carbon flux, this study used two remote sensing variables and six meteorological parameters and compared the performance of three different machine learning algorithms. After exploring the relationship between variables, the results showed that remote sensing variables can strongly impact on the carbon flux variations. Therefore, selecting and incorporating more predictors, especially remote-sensing-based variables related to vegetation, is crucial for machine learning based modelling of carbon fluxes. Moreover, the deep-learning based CNN model was used in this study. According to the results, the overall performance of the deep learning model was better than that of conventional machine learning methods. Therefore, we suggest that other deep learning based models such as ConvLSTM (Convolutional Long Short-Term Memory) [54] should be explored, and the stability and suitability of the modelling for carbon flux prediction should be improved.

This study also found that LAI is the dominant factor for the prediction of carbon flux variables. Now, studies suggest that the relation between LAI and GPP is not so consistent in different land cover types, with a strong relationship in arid areas and a weak relationship in humid areas [55]. Therefore, further investigation is needed to see how GPP can respond to variables of greenness, such as LAI and NDVI, in different vegetation types. Recent studies also showed that the increasing trend of CO₂ concentration has a positive effect on GPP trend [56], and, therefore, its incorporation in the proposed machine learning modelling can be investigated in future studies.

In addition, the spatial and temporal scales of the remote sensing data, meteorological data, and carbon flux data used in this study were incompatible. In order to match the time scale of remote sensing data, the 10-day average of the other two types of data were calculated, which may result in bias if the original time series contains missing values. On the other hand, the spatial scale of the remote sensing data is 1 km, while the resolution is 0.1° × 0.1° for the meteorological data. So, spatial interpolating was conducted before carbon flux estimation. In this process, the nearest value was used when interpolating the ERA5-Land data into a 1 km scale. This may reduce some detailed information and produce uncertainty in the 1 km carbon flux dataset.

5. Conclusions

Based on in situ carbon flux measurements, meteorological records, and remote sensing data, this paper evaluated the performance of multiple machine learning models, including CNN, ANN, and RF, in generating the global carbon flux dataset (GCFD) with three carbon fluxes: GPP, RECO, and NEE. According to the station-to-station and year-to-year validations, CNN performed better than RF and ANN models, and performed better in spatial interpolation than in temporal extrapolation. Thus, CNN was chosen to produce GCFD. GCFD achieved a satisfactory performance with R² values of 0.82, 0.72, and 0.62 for GPP, RECO, and NEE, respectively, and with RMSE values of 1.54, 1.27, and 1.31 g C m⁻² d⁻¹, respectively. Compared with GPP and RECO, NEE has relative low accuracy, especially for its temporal change, which should be examined in the application of GCFD. GCFD showed reasonable spatial and temporal variations, and in accordance with the vegetation conditions, including LAI and FAPAR. GCFD outperformed two other machine learning based products, including Zeng et al. and FLUXCOM, in providing more accurate spatial distribution with higher resolution and more accurate annual changes. GCFD has made a great contribution to carbon flux modelling by providing a high spatial resolution dataset using a deep learning model. With a spatial resolution up to 1 km, GCFD can be a useful complement to existing model-based and satellite-based datasets for various meteorological and ecological analyses and modelling, especially for the practical applications that require high resolution carbon flux maps. In the future, GCFD can be updated with more site observations and more predictors affecting carbon fluxes, such as CO₂ concentration, for longer time coverage. The application of deep learning tools in predicting carbon fluxes could also be an important field of study.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/f14050913/s1, Figure S1: The correlation coefficient matrix between the variables; Figure S2: Scatter plot of GPP, RECO and NEE observed data and predicted data by GCFD (Figure a, b and c), Zeng et al (Figure d, e and f) and FLUXCOM (Figure g, h and i); Figure S3: Scatter plot of GPP, RECO and NEE observed data and predicted data by CNN model at monthly scale in station-to-station validation (Figure a, b and c) and year-to-year validation (Figure d, e and f); Figure S4: RECO maps of the global and oceanic climate area from different products on 10th July 2010; Figure S5: NEE maps of the global and oceanic climate area from different products on 10th July 2010; Figure S6: GPP maps of the global and a continental climate area from different products, FAPAR maps and LAI maps on 10th July 2010; Figure S7: RECO maps of the global and a continental climate area from different products on 10th July 2010; Figure S8: NEE maps of the global and a continental climate area from different products on 10th July 2010; Figure S9: GPP maps of the global and a dry climate area from different products, FAPAR maps and LAI maps on 10th July 2010; Figure S10: RECO maps of the global and a dry climate area from different products on 10th July 2010; Figure S11: NEE maps of the global and a dry climate area from different products on 10th July 2010; Figure S12: GPP maps of the global and a subtropical climate area from different products, FAPAR maps and LAI maps on 10th July 2010; Figure S13: RECO maps of the global and a subtropical climate area from different products on 10th July 2010; Figure S14: NEE maps of the global and a subtropical climate area from different products on 10th July 2010.

Author Contributions

W.S. (Wei Shangguan) conceived the research and secured funding for the research. W.S. (Wei Shangguan) and Z.X. designed the experiment. Z.X. and W.S. (Wei Shangguan) performed the analyses. Z.X. wrote the first draft of the manuscript. Z.X. conducted the research. W.S. (Wei Shangguan) reviewed and edited the manuscript before submission. W.S. (Wei Shangguan), Z.X., V.N. and Q.L. revised the manuscript. All other authors joined the discussion of the research. All authors have read and agreed to the published version of the manuscript.

Funding

The study was partially supported by the National Natural Science Foundation of China (U1811464, 42088101, 41975122, 4227515 and 42205149), Guangdong Basic and Applied Basic Research Foundation (2021B0301030007), and the Innovation Group Project of Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai) (311020008).

Data Availability Statement

Data with the resolution of 1 km and 9 km can be accessed at https://dx.doi.org/DOI:10.11888/Terre.tpdc.300009 [57].

Acknowledgments

Carbon flux and meteorology measurements data used were provided by FLUXNET community and ICOS team. We thank Copernicus Global Land Service for the provision of FAPAR and LAI data. We thank ECMWF for the provision of reanalysis dataset. We acknowledge FLUXCOM and Jiye Zeng for providing the upscaled carbon flux data.

Conflicts of Interest

The authors declare that they have no conflict of interest.

References

Shakun, J.D.; Clark, P.U.; He, F.; Marcott, S.A.; Mix, A.C.; Liu, Z.; Otto-Bliesner, B.; Schmittner, A.; Bard, E. Global warming preceded by increasing carbon dioxide concentrations during the last deglaciation. Nature 2012, 484, 49–54. [Google Scholar] [CrossRef] [PubMed]
Friedlingstein, P.; Jones, M.W.; O’Sullivan, M.; Andrew, R.M.; Bakker, D.C.E.; Hauck, J.; Le Quéré, C.; Peters, G.P.; Peters, W.; Pongratz, J.; et al. Global Carbon Budget 2021. Earth Syst. Sci. Data 2022, 14, 1917–2005. [Google Scholar] [CrossRef]
Sitch, S.A.; Friedlingstein, P.; Gruber, N.; Jones, S.D.; Murray-Tortarolo, G.N.; Ahlström, A.; Doney, S.C.; Graven, H.; Heinze, C.; Huntingford, C.; et al. Recent trends and drivers of regional sources and sinks of carbon dioxide. Biogeosciences 2015, 12, 653–679. [Google Scholar] [CrossRef]
Cox, P.M.; Pearson, D.; Booth, B.B.; Friedlingstein, P.; Huntingford, C.; Jones, C.D.; Luke, C.M. Sensitivity of tropical carbon to climate change constrained by carbon dioxide variability. Nature 2013, 494, 341–344. [Google Scholar] [CrossRef]
Amthor, J.S.; Baldocchi, D.D. Terrestrial Higher Plant Respiration and Net Primary Production. In Terrestrial Global Productivity; Roy, J., Saugier, B., Mooney, H.A., Eds.; Academic Press: San Diego, CA, USA, 2001; pp. 33–59. [Google Scholar]
Beer, C.; Reichstein, M.; Tomelleri, E.; Ciais, P.; Jung, M.; Carvalhais, N.; Rödenbeck, C.; Arain, M.A.; Baldocchi, D.; Bonan, G.B. Terrestrial gross carbon dioxide uptake: Global distribution and covariation with climate. Science 2010, 329, 834–838. [Google Scholar] [CrossRef]
Aubinet, M.; Vesala, T.; Papale, D. Eddy Covariance: A Practical Guide to Measurement and Data Analysis; Springer: Dordrecht, The Netherlands, 2012. [Google Scholar]
Pastorello, G.; Trotta, C.; Canfora, E.; Chu, H.; Christianson, D.; Cheah, Y.-W.; Poindexter, C.; Chen, J.; Elbashandy, A.; Humphrey, M.; et al. The FLUXNET2015 dataset and the ONEFlux processing pipeline for eddy covariance data. Sci. Data 2020, 7, 225. [Google Scholar] [CrossRef]
Tramontana, G.; Jung, M.; Schwalm, C.R.; Ichii, K.; Camps-Valls, G.; Ráduly, B.; Reichstein, M.; Arain, M.A.; Cescatti, A.; Kiely, G.; et al. Predicting carbon dioxide and energy fluxes across global FLUXNET sites with regression algorithms. Biogeosciences 2016, 13, 4291–4313. [Google Scholar] [CrossRef]
Bharany, S.; Sharma, S.; Frnda, J.; Shuaib, M.; Khalid, M.I.; Hussain, S.; Iqbal, J.; Ullah, S.S. Wildfire Monitoring Based on Energy Efficient Clustering Approach for FANETS. Drones 2022, 6, 193. [Google Scholar] [CrossRef]
Bharany, S.; Sharma, S.; Alsharabi, N.; Tag Eldin, E.; Ghamry, N.A. Energy-efficient clustering protocol for underwater wireless sensor networks using optimized glowworm swarm optimization. Front. Mar. Sci. 2023, 10, 1117787. [Google Scholar] [CrossRef]
Bharany, S.; Badotra, S.; Sharma, S.; Rani, S.; Alazab, M.; Jhaveri, R.H.; Reddy Gadekallu, T. Energy efficient fault tolerance techniques in green cloud computing: A systematic survey and taxonomy. Sustain. Energy Technol. Assess. 2022, 53, 102613. [Google Scholar] [CrossRef]
Xiao, J.; Chen, J.; Davis, K.J.; Reichstein, M. Advances in upscaling of eddy covariance measurements of carbon and water fluxes. J. Geophys. Res. 2012, 117, G00J01. [Google Scholar] [CrossRef]
Zhang, Y.; Ye, A. Uncertainty analysis of multiple terrestrial gross primary productivity products. Glob. Ecol. Biogeogr. 2022, 31, 2204–2218. [Google Scholar] [CrossRef]
FluxCom. Available online: http://www.fluxcom.org/ (accessed on 6 September 2022).
Jung, M.; Reichstein, M.; Bondeau, A. Towards global empirical upscaling of FLUXNET eddy covariance observations: Validation of a model tree ensemble approach using a biosphere model. Biogeosciences 2009, 6, 2001–2013. [Google Scholar] [CrossRef]
Jung, M.; Reichstein, M.; Margolis, H.; Cescatti, A.; Richardson, A.D.; Arain, M.A.; Arneth, A.; Bernhofer, C.; Bonal, D.; Chen, J.; et al. Global patterns of land-atmosphere fluxes of carbon dioxide, latent heat, and sensible heat derived from eddy covariance, satellite, and meteorological observations. J. Geophys. Res. 2011, 116, G00J07. [Google Scholar] [CrossRef]
Tramontana, G.; Ichii, K.; Camps-Valls, G.; Tomelleri, E.; Papale, D. Uncertainty analysis of gross primary production upscaling using Random Forests, remote sensing and eddy covariance data. Remote Sens. Environ. 2015, 168, 360–373. [Google Scholar] [CrossRef]
Jung, M.; Schwalm, C.; Migliavacca, M.; Walther, S.; Camps-Valls, G.; Koirala, S.; Anthoni, P.; Besnard, S.; Bodesheim, P.; Carvalhais, N.; et al. Scaling carbon fluxes from eddy covariance sites to globe: Synthesis and evaluation of the FLUXCOM approach. Biogeosciences 2020, 17, 1343–1365. [Google Scholar] [CrossRef]
Jung, M.; Koirala, S.; Weber, U.; Ichii, K.; Gans, F.; Camps Valls, G.; Papale, D.; Schwalm, C.R.; Tramontana, G.; Reichstein, M. The FLUXCOM ensemble of global land-atmosphere energy fluxes. Sci. Data 2019, 6, 74. [Google Scholar] [CrossRef]
Chen, Y.; Shen, W.; Gao, S.; Zhang, K.; Wang, J.; Huang, N. Estimating deciduous broadleaf forest gross primary productivity by remote sensing data using a random forest regression model. J. Appl. Remote Sens. 2019, 13, 038502. [Google Scholar] [CrossRef]
Xiao, J.; Zhuang, Q.; Baldocchi, D.D.; Law, B.E.; Richardson, A.D.; Chen, J.; Oren, R.; Starr, G.; Noormets, A.; Ma, S.; et al. Estimation of net ecosystem carbon exchange for the conterminous United States by combining MODIS and AmeriFlux data. Agric. For. Meteorol. 2008, 148, 1827–1847. [Google Scholar] [CrossRef]
Li, X.; Xiao, J. A Global, 0.05-Degree Product of Solar-Induced Chlorophyll Fluorescence Derived from OCO-2, MODIS, and Reanalysis Data. Remote Sens. 2019, 11, 517. [Google Scholar] [CrossRef]
Yao, Y.; Wang, X.; Li, Y.; Wang, T.; Shen, M.; Du, M.; He, H.; Li, Y.; Luo, W.; Ma, M.; et al. Spatiotemporal pattern of gross primary productivity and its covariation with climate in China over the last thirty years. Glob. Chang. Biol. 2018, 24, 184–196. [Google Scholar] [CrossRef] [PubMed]
Zeng, J.; Matsunaga, T.; Tan, Z.-H.; Saigusa, N.; Shirai, T.; Tang, Y.; Peng, S.; Fukuda, Y. Global terrestrial carbon fluxes of 1999–2019 estimated by upscaling eddy covariance data with a random forest. Sci. Data 2020, 7, 313. [Google Scholar] [CrossRef] [PubMed]
Alemohammad, S.H.; Fang, B.; Konings, A.G.; Aires, F.; Green, J.K.; Kolassa, J.; Miralles, D.; Prigent, C.; Gentine, P. Water, Energy, and Carbon with Artificial Neural Networks (WECANN): A statistically based estimate of global surface turbulent fluxes and gross primary productivity using solar-induced fluorescence. Biogeosciences 2017, 14, 4101–4124. [Google Scholar] [CrossRef] [PubMed]
Ichii, K.; Ueyama, M.; Kondo, M.; Saigusa, N.; Kim, J.; Alberto, M.C.; Ardö, J.; Euskirchen, E.S.; Kang, M.; Hirano, T.; et al. New data-driven estimation of terrestrial CO₂ fluxes in Asia using a standardized database of eddy covariance measurements, remote sensing data, and support vector regression. J. Geophys. Res. Biogeosci. 2017, 122, 767–795. [Google Scholar] [CrossRef]
Yuan, Q.; Shen, H.; Li, T.; Li, Z.; Li, S.; Jiang, Y.; Xu, H.; Tan, W.; Yang, Q.; Wang, J.; et al. Deep learning in environmental remote sensing: Achievements and challenges. Remote Sens. Environ. 2020, 241, 111716. [Google Scholar] [CrossRef]
Yu, T.; Pang, Y.; Sun, R.; Niu, X. Spatial Downscaling of Vegetation Productivity in the Forest from Deep Learning. IEEE Access 2022, 10, 104449–104460. [Google Scholar] [CrossRef]
Delwiche, K.B.; Knox, S.H.; Malhotra, A.; Fluet-Chouinard, E.; McNicol, G.; Feron, S.; Ouyang, Z.; Papale, D.; Trotta, C.; Canfora, E.; et al. FLUXNET-CH4: A global, multi-ecosystem dataset and analysis of methane seasonality from freshwater wetlands. Earth Syst. Sci. Data 2021, 13, 3607–3689. [Google Scholar] [CrossRef]
Drought 2018 Team & ICOS Ecosystem Thematic Centre. Drought-2018 Ecosystem Eddy Covariance Flux Product for 52 Stations in FLUXNET-Archive Format. 2020. Available online: https://www.icos-cp.eu/data-products/YVR0-4898 (accessed on 6 September 2022). [CrossRef]
Fuster, B.; Sánchez-Zapero, J.; Camacho, F.; García-Santos, V.; Verger, A.; Lacaze, R.; Weiss, M.; Baret, F.; Smets, B. Quality Assessment of PROBA-V LAI, fAPAR and fCOVER Collection 300 m Products of Copernicus Global Land Service. Remote Sens. 2020, 12, 1017. [Google Scholar] [CrossRef]
Muñoz-Sabater, J.; Dutra, E.; Agustí-Panareda, A.; Albergel, C.; Arduini, G.; Balsamo, G.; Boussetta, S.; Choulga, M.; Harrigan, S.; Hersbach, H.; et al. ERA5-Land: A state-of-the-art global reanalysis dataset for land applications. Earth Syst. Sci. Data 2021, 13, 4349–4383. [Google Scholar] [CrossRef]
Albawi, S.; Mohammed, T.A.; Al-Zawi, S. Understanding of a convolutional neural network. In Proceedings of the 2017 International Conference on Engineering and Technology (ICET), Antalya, Turkey, 21–23 August 2017; pp. 1–6. [Google Scholar]
Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Beck, H.E.; Zimmermann, N.E.; McVicar, T.R.; Vergopolan, N.; Berg, A.; Wood, E.F. Present and future Köppen-Geiger climate classification maps at 1-km resolution. Sci. Data 2018, 5, 180214. [Google Scholar] [CrossRef]
Lasslop, G.; Reichstein, M.; Papale, D.; Richardson, A.D.; Arneth, A.; Barr, A.; Stoy, P.; Wohlfahrt, G. Separation of net ecosystem exchange into assimilation and respiration using a light response curve approach: Critical issues and global evaluation. Glob. Chang. Biol. 2010, 16, 187–208. [Google Scholar] [CrossRef]
Gao, X.; Zhu, J.; Zeng, X.; Zhang, M.; Dai, Y.; Ji, D.; Zhang, H. Changes in Global Vegetation Distribution and Carbon Fluxes in Response to Global Warming: Simulated Results from IAP-DGVM in CAS-ESM2. Adv. Atmos. Sci. 2022, 39, 1285–1298. [Google Scholar] [CrossRef]
Norton, A.J.; Rayner, P.J.; Koffi, E.N.; Scholze, M.; Silver, J.D.; Wang, Y.P. Estimating global gross primary productivity using chlorophyll fluorescence and a data assimilation system with the BETHY-SCOPE model. Biogeosciences 2019, 16, 3069–3093. [Google Scholar] [CrossRef]
Bai, Y.; Liang, S.; Yuan, W. Estimating Global Gross Primary Production from Sun-Induced Chlorophyll Fluorescence Data and Auxiliary Information Using Machine Learning Methods. Remote Sens. 2021, 13, 963. [Google Scholar] [CrossRef]
Luyssaert, S.; Inglima, I.; Jung, M.; Richardson, A.D.; Reichstein, M.; Papale, D.; Piao, S.L.; Schulze, E.D.; Wingate, L.; Matteucci, G.; et al. CO2 balance of boreal, temperate, and tropical forests derived from a global database. Glob. Chang. Biol. 2007, 13, 2509–2537. [Google Scholar] [CrossRef]
Jung, M.; Reichstein, M.; Schwalm, C.R.; Huntingford, C.; Sitch, S.; Ahlström, A.; Arneth, A.; Camps-Valls, G.; Ciais, P.; Friedlingstein, P.; et al. Compensatory water effects link yearly global land CO₂ sink changes to temperature. Nature 2017, 541, 516–520. [Google Scholar] [CrossRef] [PubMed]
Johnston, A.S.A.; Meade, A.; Ardö, J.; Arriga, N.; Black, A.; Blanken, P.D.; Bonal, D.; Brümmer, C.; Cescatti, A.; Dušek, J.; et al. Temperature thresholds of ecosystem respiration at a global scale. Nat. Ecol. Evol. 2021, 5, 487–494. [Google Scholar] [CrossRef] [PubMed]
Tang, R.; He, B.; Chen, H.W.; Chen, D.; Chen, Y.; Fu, Y.H.; Yuan, W.; Li, B.; Li, Z.; Guo, L.; et al. Increasing terrestrial ecosystem carbon release in response to autumn cooling and warming. Nat. Clim. Chang. 2022, 12, 380–385. [Google Scholar] [CrossRef]
Ai, J.; Xiao, S.; Feng, H.; Wang, H.; Jia, G.; Hu, Y. A global terrestrial ecosystem respiration dataset (2001–2010) estimated with MODIS land surface temperature and vegetation indices. Big Earth Data 2020, 4, 142–152. [Google Scholar] [CrossRef]
Ai, J.; Jia, G.; Epstein, H.E.; Wang, H.; Zhang, A.; Hu, Y. MODIS-Based Estimates of Global Terrestrial Ecosystem Respiration. J. Geophys. Res. Biogeosci. 2018, 123, 326–352. [Google Scholar] [CrossRef]
Zhou, Q.; Fellows, A.; Flerchinger, G.N.; Flores, A.N. Examining Interactions Between and Among Predictors of Net Ecosystem Exchange: A Machine Learning Approach in a Semi-arid Landscape. Sci. Rep. 2019, 9, 2222. [Google Scholar] [CrossRef] [PubMed]
Bodesheim, P.; Jung, M.; Gans, F.; Mahecha, M.D.; Reichstein, M. Upscaled diurnal cycles of land–atmosphere fluxes: A new global half-hourly data product. Earth Syst. Sci. Data 2018, 10, 1327–1365. [Google Scholar] [CrossRef]
Liang, W.; Zhang, W.; Jin, Z.; Yan, J.; Lü, Y.; Wang, S.; Fu, B.; Li, S.; Ji, Q.; Gou, F.; et al. Estimation of Global Grassland Net Ecosystem Carbon Exchange Using a Model Tree Ensemble Approach. J. Geophys. Res. Biogeosci. 2020, 125, e2019JG005034. [Google Scholar] [CrossRef]
Ma, J.; Xiao, X.; Miao, R.; Li, Y.; Chen, B.; Zhang, Y.; Zhao, B. Trends and controls of terrestrial gross primary productivity of China during 2000–2016. Environ. Res. Lett. 2019, 14, 084032. [Google Scholar] [CrossRef]
Hu, L.; Fan, W.; Ren, H.; Liu, S.; Cui, Y.; Zhao, P. Spatiotemporal Dynamics in Vegetation GPP over the Great Khingan Mountains Using GLASS Products from 1982 to 2015. Remote Sens. 2018, 10, 488. [Google Scholar] [CrossRef]
Gomez-Casanovas, N.; DeLucia, N.J.; DeLucia, E.H.; Blanc-Betes, E.; Boughton, E.H.; Sparks, J.; Bernacchi, C.J. Seasonal Controls of CO₂ and CH₄ Dynamics in a Temporarily Flooded Subtropical Wetland. J. Geophys. Res. Biogeosci. 2020, 125, e2019JG005257. [Google Scholar] [CrossRef]
Shi, X.; Chen, Z.; Wang, H.; Yeung, D.-Y.; Wong, W.-K.; Woo, W.-C. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar] [CrossRef]
Hu, Z.; Piao, S.; Knapp, A.K.; Wang, X.; Peng, S.; Yuan, W.; Running, S.; Mao, J.; Shi, X.; Ciais, P.; et al. Decoupling of greenness and gross primary productivity as aridity decreases. Remote Sens. Environ. 2022, 279, 113120. [Google Scholar] [CrossRef]
Sun, Z.; Wang, X.; Yamamoto, H.; Tani, H.; Zhong, G.; Yin, S.; Guo, E. Spatial pattern of GPP variations in terrestrial ecosystems and its drivers: Climatic factors, CO₂ concentration and land-cover change, 1982–2015. Ecol. Inform. 2018, 46, 156–165. [Google Scholar] [CrossRef]
Xiong, Z.; Shangguan, W. A 1-km 10-Day Global Carbon Fluxes Dataset Using In-Situ Measurement (1999–2020); National Tibetan Plateau/Third Pole Environment Data Center: Beijing, China, 2022. [Google Scholar] [CrossRef]

Figure 1. The framework of this study. The red dots are sites in the training set and the blue dots are site in the test set.

Figure 2. The coefficient of determination (a,

R^{2}

) and root mean square error (b, RMSE) for station-to-station cross-validation of three models. RF: random forest; ANN: artificial neural network; CNN: convolutional neural network.

Figure 2. The coefficient of determination (a,

R^{2}

) and root mean square error (b, RMSE) for station-to-station cross-validation of three models. RF: random forest; ANN: artificial neural network; CNN: convolutional neural network.

Figure 3. Scatter plots of predicted and observed values of GPP, RECO, and NEE of station-to-station cross-validation for CNN model; (a–c) are GPP, RECO, and NEE results, respectively.

Figure 4. The

R^{2}

(a) and RMSE (b) results of year-to-year validation by leaving out data in 2018.

Figure 4. The

R^{2}

(a) and RMSE (b) results of year-to-year validation by leaving out data in 2018.

Figure 5. The prediction bias of (a) GPP, (b) RECO, and (c) NEE from all sites using RF, ANN, and CNN.

Figure 6. Timeseries of GPP (gC m⁻² d⁻¹) observed by FLUXNET and predicted by three models at 4 sites, including (a) ZA-Kru in subtropic, (b) NL-Loo in oceanic climate, (c) FI-Hyy in continental climate, and (d) CH-Dav in polar climate. Red circles indicate significant overestimation.

Figure 7. Annual timeseries of GPP, RECO and NEE for all sites in the global and in seven climate zones. The number of sites in each area is given in brackets.

Figure 8. Global distribution of average GPP (a), RECO (b), and NEE (c) (gC m⁻² d⁻¹) from 1999 to June 2020 in the GCFD.

Figure 9. Bar chart of feature importance of random forest model: GPP (a), RECO (b), and NEE (c).

Figure 10. GPP maps of the global and an oceanic climate area from different products, FAPAR maps, and LAI maps on 10 July 2010. The resolution is 1 km for GCFD, 0.5 degree for FLUXCOM, and 0.1 degree for Zeng et al.

Table 1. The simulation setting.

	Search Range	Experiment
RF No. of trees	100–900	500
RF minimum samples at a leaf node	1–10	5
ANN No. of dense layers	1–5	3
ANN No. of neurons	10–90	90
CNN No. of dense layers	1–5	3
CNN No. of neurons	10–90	70
ANN and CNN learning rate	0.1, 0.01, 0.001	0.01

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shangguan, W.; Xiong, Z.; Nourani, V.; Li, Q.; Lu, X.; Li, L.; Huang, F.; Zhang, Y.; Sun, W.; Dai, Y. A 1 km Global Carbon Flux Dataset Using In Situ Measurements and Deep Learning. Forests 2023, 14, 913. https://doi.org/10.3390/f14050913

AMA Style

Shangguan W, Xiong Z, Nourani V, Li Q, Lu X, Li L, Huang F, Zhang Y, Sun W, Dai Y. A 1 km Global Carbon Flux Dataset Using In Situ Measurements and Deep Learning. Forests. 2023; 14(5):913. https://doi.org/10.3390/f14050913

Chicago/Turabian Style

Shangguan, Wei, Zili Xiong, Vahid Nourani, Qingliang Li, Xingjie Lu, Lu Li, Feini Huang, Ye Zhang, Wenye Sun, and Yongjiu Dai. 2023. "A 1 km Global Carbon Flux Dataset Using In Situ Measurements and Deep Learning" Forests 14, no. 5: 913. https://doi.org/10.3390/f14050913

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A 1 km Global Carbon Flux Dataset Using In Situ Measurements and Deep Learning

Abstract

1. Introduction

2. Materials and Methods

2.1. Target Carbon Fluxes and Predictors

2.1.1. Flux Tower Site Data

2.1.2. Remote Sensing Data

2.1.3. Reanalysis Data

2.2. Models for Carbon Flux Prediction

2.3. Experiment Design

2.4. Model Evaluation and Data Production

3. Results

3.1. Station-to-Station and Year-to-Year Cross Validation Results

3.2. Prediction of Time Series at Single Sites

3.3. Multi-Site Averaged Time Series

3.4. Spatial Patterns

4. Discussion

4.1. Main Variables Affecting Carbon Fluxes

4.2. Comparison with Different Global Carbon Flux Products

4.3. Uneven Distribution of Sites

4.4. Selection of Predictors and Models

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI