Estimation of Aerosol Optical Depth at 30 m Resolution Using Landsat Imagery and Machine Learning

Liang, Tianchen; Liang, Shunlin; Zou, Linqing; Sun, Lin; Li, Bing; Lin, Hao; He, Tao; Tian, Feng

doi:10.3390/rs14051053

Open AccessArticle

Estimation of Aerosol Optical Depth at 30 m Resolution Using Landsat Imagery and Machine Learning

by

Tianchen Liang

¹,

Shunlin Liang

²

,

Linqing Zou

¹

,

Lin Sun

³,

Bing Li

¹,

Hao Lin

¹

,

Tao He

¹

and

Feng Tian

^1,*

¹

School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China

²

Department of Geographical Sciences, University of Maryland, College Park, MD 20742, USA

³

College of Geomatics, Shandong University of Science and Technology, Qingdao 266590, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2022, 14(5), 1053; https://doi.org/10.3390/rs14051053

Submission received: 21 January 2022 / Revised: 17 February 2022 / Accepted: 18 February 2022 / Published: 22 February 2022

Download

Browse Figures

Versions Notes

Abstract

:

Current remote sensing-based aerosol optical depth (AOD) products have coarse spatial resolutions, which are useful for studies at continental and global scales, but unsatisfactory for local scale applications, such as urban air pollution monitoring. In this study, we investigated the possibility of using Landsat imagery to develop high-resolution AOD estimations at 30 m based on machine learning algorithms. We assessed the performance of six machine learning algorithms, including Extreme Gradient Boosting, Random Forest, Cascade Random Forest, Gradient Boosted Decision Trees, Extremely Randomized Trees, and Multiple Linear Regression. To obtain accurate AOD estimations, we used prior knowledge from multiple sources as inputs to the machine learning models, including the Global Land Surface Satellite (GLASS) albedo, the 1-km AOD product from MODIS data using the Multi-Angle Implementation of Atmospheric Correction (MAIAC) algorithm, and meteorological and surface elevation data. A total of 13,624 AOD measurements from Aerosol Robotic Network (AERONET) sites were used for model training and validation. We found that all six algorithms exhibited good performance, with R² values ranging from 0.73 to 0.78 and AOD root-mean-square errors (RMSE) ranging from 0.089 to 0.098. The extremely randomized trees algorithm, however, demonstrated marginally superior performance as compared to the other algorithms; hence, it was used to produce AOD estimates at a 30 m resolution for one Landsat scene coving Beijing in 2013–2019. Through a comparison with overlapping AERONET observations, a high level of accuracy was achieved, with an R² = 0.889 and an RMSE = 0.156. Our method can be potentially used to generate a global high-resolution AOD dataset based on Landsat imagery.

Keywords:

aerosol optical depth; machine learning; Landsat; high resolution

Graphical Abstract

1. Introduction

Atmospheric aerosols directly affect the radiation energy budget of the earth by scattering and absorbing solar radiation [1,2,3]. Moreover, aerosols indirectly affect the climate through the processes of cloud generation and dissipation, precipitation [4], photosynthesis, and ecosystem evapotranspiration [5,6,7], and also contribute to the terrestrial carbon cycle through the diffuse radiation fertilization effect and hydrometeorological feedback [8]. Therefore, it is important to accurately estimate the spatial and temporal variations of aerosols across the globe. Aerosols are quantified using the Aerosol Optical Depth (AOD) parameter, which is the combined measurement of various aerosols distributed within an air column. Ground-based sun photometers can continuously measure AOD with a high level of accuracy and are installed at Aerosol Robotic Network (AERONET) field sites across the globe [9]. With wide spatial and repetitive coverage, satellite remote sensing provides an effective method to upscale AOD estimation to large areas.

Several satellite-based AOD instruments have been developed, which can be classified based on the type of satellite sensors used (Table 1). They include (1) multispectral sensors, such as Moderate Resolution Imaging Spectroradiometer (MODIS) AOD instruments with 10 km, 3 km, and 1 km resolutions [10,11,12]; the Medium Resolution Imaging Spectrometer (MERIS) AOD sensor with a 10 km resolution [13]; the Visible Infrared Imaging Radiometer Suite (VIIRS) with a 6 km resolution [14]; and the Himawari-8 Geostationary AOD instrument with a 0.05° resolution [15]; (2) multi-angle sensors, such as the Multi-angle Imaging Spectro-Radiometer (MISR) AOD sensor with 17.6 km and 4.4 km resolutions [16,17]; (3) polarization sensors, such as the Polarization and Directionality of Earth Reflectance (POLDER) sensor with an 18.5 km resolution [18]; and (4) Lidar sensors, such as Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observations (CALIPSO) instruments with a 5 km resolution [19]. All these AOD products have relatively coarse spatial resolutions, i.e., coarser than 1 km, which limits their applications at the local scale, e.g., for detailed air-quality monitoring at the city scale.

High-resolution AOD products are needed to identify subtle differences in local contamination caused by human activities. This is particularly necessary as a result of the rapid urbanization process and its associated air pollution. Previous studies investigated the feasibility of several remote sensing data sources to estimate high-resolution AOD over cities. For example, Sun and Tian estimated the AOD from Landsat-8 at a 500 m resolution using the relationships between the AOD, surface reflectance, and the observed top-of-atmosphere (TOA) reflectance based on radiative transfer simulations [20,21,22]. Muhammad Bilal used AOD data at a 30 m resolution retrieved from Landsat 8 Operational Land Imager (OLI) images using the Simplified Aerosol Retrieval Algorithm (SARA) with ancillary aerosol information from MODIS [23]. Luo also applied SARA to retrieve the AOD with Landsat surface reflectance values from both bright and dark objects [24]. Zhong combined the modified dark object method [25,26] with prior spectral information to obtain high spatial resolution AOD at a 30 m resolution [27]. The aforementioned studies were all focused on local areas. In contrast, the Land Surface Reflectance Code (LaSRC) [28] was developed by NASA for the atmospheric correction of Landsat-8 imagery on a global scale, based on a 6SV radiative transfer model [29]. The LaSRC algorithm uses two surface reflectance ratios computed by a global ratio dataset, derived from MODIS and MISR data at a 5 km resolution, to retrieve the AOD at 550 nm for each 30 m Landsat-8 pixel.

Many factors need to be considered for AOD retrieval from remote sensing observations, including aerosol type, land changes, and human activity, all of which can be heterogeneous across space and time. Simplification of these factors, for example, by using a Lambertian surface and fixed aerosol types in the model [28,30,31], would therefore introduce large uncertainties in the AOD estimations [32,33,34,35]. One solution to reduce these uncertainties is to use prior knowledge. With the increasing amount of prior knowledge (model inputs) available, machine learning algorithms have become the first choice for building models for AOD estimation, and have been used to estimate AOD, as well as PM2.5 and PM10 concentrations [36,37,38,39]. Several studies [38,40,41,42] adopted neural networks to construct the relationships between satellite observations and ground-observed AOD, while others used Random Forest (RF) modeling [37].

In this study, we aimed to estimate high-resolution AOD at a 30 m resolution using Landsat imagery and machine learning algorithms. First, we evaluated the performance of six machine learning algorithms to estimate the AOD under our study framework. Second, we analyzed the accuracy improvements by incorporating different prior knowledge. Finally, we produced AOD estimations based on the best machine learning model for one Landsat scene covering Beijing and compared it with the output of the LaSRC algorithm.

2. Data Sources and Processing

2.1. Satellite Data

Landsat-7 and Landsat-8 were launched in April 1999 and February 2013, respectively. The former carries the Enhanced Thematic Mapper (ETM+) and the latter the Operational Land Imager (OLI) sensor. They provide high spatial resolutions, covering several multispectral bands which have been widely used in aerosol monitoring. The Google Earth Engine (GEE) is a cloud-based geospatial analysis platform that enables effective global satellite data processing. In this study, Top of Atmosphere (TOA) data from Landsat-8 OLI bands 2–7 from 2013 to 2019 and Landsat-7 ETM+ bands 1–7 from 2001 to 2012 were extracted from GEE on a global scale. A detailed comparison of those bands is provided in Table 2. Lansat-8 OLI data were mainly used for model construction and prediction, while Landsat-7 ETM+ data were used for independent validation. The OLI and ETM+ have corresponding bands, but their sensor spectral response functions are different. Thus, the corresponding ETM+ bands were converted to the equivalent OLI bands by the spectral regression model [43]. We also used angular information, including the solar zenith and azimuth angles, to account for variations in aerosol scattering or absorption at different angles.

2.2. AOD Ground Measurements

AERONET is a global ground-based aerosol observation network that provides long-term aerosol monitoring. Specifically, it provides spectral AOD, water vapor, spectral solar irradiance, and various aerosol retrieval products. In this study, we collected the latest global V3 cloud-screened and quality-controlled level 2.0 AOD observations for better accuracy [44] from 453 monitoring stations (Figure 1) from 1 January 2001 to 31 December 2019. In certain cases, the Angstrom exponent method [45] was used for band conversion in cases where AERONET did not measure the AOD at 550 nm. A minimum of two AERONET AOD measurements at each site within ±30 min around the Landsat-7 and Landsat-8 overpass time were averaged to obtain a temporally matched value that was used as the ground truth. Meanwhile, the AERONET inversions of water vapor and single scattering albedo (SSA) were extracted for subsequent analysis.

2.3. GLASS Albedo Product

Since the Landsat TOA data contain information on both surface reflectivity and aerosol attenuation under clear sky conditions, prior knowledge of surface reflectivity helped us to decouple the contribution of the atmosphere from the TOA signal. Different bands are affected by aerosols differently. If surface knowledge can provide surface information at different wavelengths, the contribution of aerosols can be comprehensively measured. Ideally, spectral surface reflectance products should be incorporated into the overall dataset, but determining this reflectance may create greater uncertainty than the simple use of the AOD estimation. Instead, the broadband visible albedo was used, as it represents the contribution of the surface over the whole visible range. The 1 km black-sky-visible albedo from 2001 to 2019 from the Global Land Surface Satellite (GLASS), which was employed in [46,47,48] and has a higher weight than the blue-sky-albedo, was selected to represent the surface reflective property. Because of their temporal and spatial continuity characteristics, GLASS products have been widely used in various applications [49,50].

2.4. MAIAC AOD Product

The Multi-Angle Implementation of Atmospheric Correction (MAIAC) algorithm was developed for MODIS. It retrieves aerosol properties at a resolution of 1 km overland [51]. The newly released Terra and Aqua 1 km MAIAC AOD products, which use the revised MAIAC algorithm, work over both dark and bright surfaces with continuous improvements in internal cloud masks, snow detection, and the determination of aerosol models [12]. The high accuracy of MAIAC AOD makes it possible to characterize aerosol heterogeneity at a 1 km resolution [52,53,54]. We extracted the global AOD product (MAC19A2) from 2001 to 2019 from GEE for use as the prior knowledge. Because of missing data, two methods of spatial-temporal hybrid fusion considering aerosol variation mitigation (ST-AVM) [55] and statistical linear regression [36] were used to fill the missing pixels.

2.5. Auxiliary Data

Auxiliary data include meteorological and surface topographic information. The latest ERA5 atmospheric products were employed, which provided a 25 km spatial resolution and 1 h temporal resolutions of total column ozone (O₃) and water vapor (WVC). The O₃ and WVC data were interpolated to match the Landsat overpass times. The digital elevation model (DEM) data were obtained from the NASADEM digital elevation 30 m product, consisting of reprocessed Shuttle Radar Topography Mission data, the accuracy of which were improved by incorporating auxiliary data from five other DEM datasets [56]. Table S2 summarizes the data sources used in this study.

2.6. Data Pre-Processing

All data with different spatial and temporal resolutions were resampled into the same grid size (0.003° × 0.003° ≈ 30 × 30 m) and the same temporal interval to match the Landsat pixels. The spatial matching of images at different resolutions was performed using coordinate latitude and longitude: a coarse resolution image was matched to a fine resolution scale by calculating the distance between the longitude and latitude of the adjacent pixels. Specifically, Landsat-8 pixels contained within the great-circle distances (Equation (1)) of adjacent pixels of coarse resolution data were calculated, and the coarse resolution pixel values were assigned to all Landsat-8 pixels. The great-circle distances between two points were calculated with the Haversine approach, using latitude and longitude (Equation (1)):

\begin{matrix} D I S_{i, j} = 2 r * a s i n & (s q r t [s i n^{2} (\frac{L a t_{i, j} - L o n_{i, j}}{2}) \\ + \cos (L a t_{i, j}) \cos (L a t_{0}) s i n^{2} (\frac{L o n_{i, j} - L o n_{0}}{2})]) \end{matrix}

(1)

where

L a t_{i, j}

,

L o n_{i, j}

, and

L a t_{0}

,

L o n_{0}

denote the latitudes and longitudes of one point and the corner or center of a rectangle in space, respectively, and r represents the Earth’s radius (r = 6371 km). For the distance calculations, heavy, parallel computing was used to speed up the spatial matching process. All independent variables were then matched with each AERONET site’s AOD measurement. After removing invalid values, there were a total of 13,624 data pairs from 2013 to 2019 (OLI) and 5021 from 2001 to 2012 (ETM+).

3. Methodology

The basic premise was to determine an optimal machine learning model for the relationships between site AOD measurements and all input variables, and then to estimate AOD at a 30 m spatial resolution from Landsat data and other ancillary information. Mathematically, AOD = f (AOD_MAIAC, Albedo_GLASS, Reflectance_6-bands, Solar-angle₂, Elevation, O₃, WVC). The methodology and experimental setup for this study are outlined in Figure 2.

3.1. Model Development

Many machine learning models have been successfully applied in the field of atmospheric remote sensing. To develop a robust model for this study, it was necessary to evaluate multiple machine learning models. We analyzed six models, including Random Forest (RF), Cascade Random Forest (CasRF), and Extremely Randomized Trees (ERF), which are part of the bagging algorithm, as well as Extreme Gradient Boosting (XGBoost) and Gradient Boosted Decision Trees (GBDT), which are part of the boosting algorithm. Finally, classical Multiple Linear Regression (MLR) was added to the analysis. See Appendix A for a description of these different machine learning algorithms. Each machine learning model needs an optimal parameter set to produce the best AOD estimation performance. We used the 10-fold cross-validation (CV) method to find the optimal parameters, and then used the coefficient of determination (R²) and the root-mean-square error (RMSE) to evaluate the model accuracy, i.e., to cycle through different model parameters to find the set with the largest R² and the smallest RMSE. Table S3 lists the best parameter combinations used for each machine learning algorithm.

3.2. Evaluation Approaches

The evaluation included two parts. The first part was model construction (fitting and validation). The model validation was based on the out-of-sample method for all Landsat-8 samples. The 10-fold CV approaches were used to divide the data samples into ten random subsets, where nine of the subsets were used for training and one for validation. For each model parameter set, this step was repeated ten times and the error rates were averaged to obtain the result. The second evaluation was based on the independent validation samples that were converted from Landsat-7 ETM+ data from the years 2001–2012. Independent tests were performed on the Landsat-7 samples using the model developed for Landsat-8, which ensured that the data samples for model training and validation were completely independent. Finally, the best performance model developed for Landsat-8 (2013–2018) was used to estimate the AODs for 2019, which were then validated against the corresponding ground measurements. The estimated AODs were calculated with a precision of one pixel (30 m) around an AERONET site. Values of R², RMSE, the mean absolute error (MAE), and MODIS expected error (EE, ± [0.05 + 0.15 × AOD_AERONET) were employed to evaluate the accuracy of the estimated AOD.

4. Results and Discussion

4.1. Models Fitting and Validation

Table S4 shows the statistical accuracy of the fitting results for the six machine learning models. We can see that the R² ranged from 0.667 to 0.773. The bagging ensemble exhibited good performance, with RMSE values between 0.086~0.093. In the boosting algorithm, GBDT’s fitting ability was marginally poorer (RMSE = 0.097, R² = 0.724) than XGBoost’s fitting accuracy, with a coefficient of determination of 0.752. MLR was the worst, indicating that there was no clear linear relationship between data variables, and that it was difficult to fit with a linear model. In summary, the bagging and boosting ensemble performed well at the training stage.

We compared the performance of the different machine learning models introduced in Appendix A. Table 3 shows the 10-CV results for the different models. From the table, we can see that the results from MLR had the lowest R², with a value of 0.731, while the coefficient of determination for ERF was the best (R² = 0.780). For XGBoost, RF, and CasRF, the R² values were approximately 0.776, which were close to ERF value. However, the R² of GBDT was slightly lower than that of the others, but nevertheless better than that of MLR. Except for the MLR model, the RMSE of the machine learning algorithms was very close, at approximately 0.09. In summary, the bagging and boosting ensemble produced good fitting performance.

Table 4 shows the independent accuracy test results of the above models using the Landsat-7 independent validation samples. All of these models were able to estimate the average AOD with good accuracy. The AOD derived from the ERF demonstrated the best estimation performance, with an R² value of 0.770, indicating good fitting performance. The overall RMSE value was approximately 0.116. On the contrary, the retrievals of the MLR model were in agreement with the AERONET AOD measurements, with 71.27% of them falling within the EE envelope, and with R² and RMSE values of 0.760 and 0.117, respectively. In summary, the above results show that ERF had marginal advantages in terms of model fitting and validation; thus, it was chosen for the experiments.

4.2. Estimating Landsat-8 AOD

This section further evaluates model robustness. The ERF model fitted with data from 2013 to 2018 was used to estimate the AOD in 2019. The AOD estimates for 2019 generated by the ERF model were close to measurements, with an R² value of 0.791, an RMSE of 0.067, and an MAE of 0.042, and approximately 87.82% of the estimates fell within the EE envelope (Figure 3).

4.3. Importance of Using Prior Knowledge

To quantify the improvement to model accuracy using different prior knowledge data sets, seven AOD estimation models were separately constructed using the ERF algorithm. The first model was constructed using only the OLI TOA reflectance of six bands and the solar zenith angle data; the remaining five models (f2–f6) were constructed by adding the elevation, meteorological, albedo, and MAIAC AOD data in turn. The final model (f7) contained only a priori information without the Landsat TOA and angular information, and it was used to measure the importance of the satellite signal to the model. Seven models were used to estimate the AOD at the AERONET sites separately for each of the 19 years of data, and R², RMSE, MAE, and EE were used to evaluate the estimation accuracy. Table 5 shows the estimation results for these models. When fitting the model using only TOA and solar angles, model f1 exhibited poor accuracy, with a low coefficient of determination (R² = 0.313). The DEM demonstrated a slight increase in accuracy, with a coefficient of determination of 0.338. The O₃ and WVC concentrations are always changing depending on time and location, and are the most important gases to consider, especially in the visible channels [57]. The addition of O₃ improved the correlation and the RMSE of the AOD estimation (R² = 0.368, RMSE = 0.116). WVC produced significant accuracy improvements, with a correlation reaching 0.456. The surface albedo can help the model decouple the contribution of the AOD from the apparent reflectance information; thus, the correlation improved to 0.545 and the RMSE reduced to 0.091, with approximately 76.03% of the points falling within the EE error line (model f5). The addition of MAIAC AOD significantly improved the accuracy of the model, with a 45% improvement in the correlation and a 26% reduction in the RMSE, as compared to f5. Figure S1 shows the accuracy validation scatter plot for MAIAC AOD and AERONET AOD at Landsat-8 overpassing time (RMSE = 0.08, MAE = 0.05, within EE = 82.26%). This error to some extent reflects the AOD bias due to differences in sensor transit times, since aerosols can change dramatically with time. As the model-input Landsat-8 observations and the corresponding prior knowledge of the surface and atmospheric conditions, the AOD bias could be reduced (model f6). As compared with MAIAC AOD, its R² was improved by up to ~0.07, and the RMSE was reduced by 0.01. When the model was constructed using only prior information (model f7), the accuracy of the AOD estimation decreased, i.e., the RMSE increased by ~9.0% and the R² decreased by ~5.4%. This, to some extent, shows that the addition of satellite observations can better correct for AOD bias due to differences in the MODIS and Landsat transit times.

The biases of the model-estimated AOD and MAIAC AOD under different surface conditions, represented by Normalized Differential Vegetation Index (NDVI) values and atmospheric aerosol conditions, were also explored to analyze where and how the prior information could improve the model’s accuracy. The MAIAC/estimated–AERONET AOD matchups were then filtered by each specific surface and atmospheric aerosol condition and grouped into narrow bins, each containing different numbers of retrievals, in order to analyze the distribution of errors.

The global surface was first divided into nine NDVI bins (Figure 4), then MAIAC and the predicted AODs were evaluated against the AERONET AODs in each bin. Overall, a high level of accuracy was observed for the MAIAC AOD (Figure 4b) when NDVI ≤ 0 (mostly water surfaces), with an AOD bias approximately equal to 0.02. However, when 0 ≤ NDVI ≤0.3, which indicated sparse vegetables, the accuracy was degraded, with larger biases (−0.05~0.01) occurring. With an increase in NDVI (0.3 < NDVI < 1.0), the performance continued to improve, as was characterized by decreasing negative biases. In areas covered by dense vegetation (NDVI close to 1) in particular, the mean bias was approximately −0.01. The ERF AOD accuracy was similar to that of MAIAC AOD over water and densely vegetated surfaces; however, for sparse vegetation (0 ≤ NDVI ≤0.3), ERF was able to effectively reduce the biases of MAIAC AOD (−0.02~0).

SSA measures the ratio of absorption and scattering of solar radiation by aerosols and is an important parameter in the optical properties of aerosols. In addition, SSA is crucial to the AOD retrieval, and errors in SSA determination produce many further errors in AOD retrieval. WVC is also important. Huttunen showed that WVC and AOD typically demonstrate a positive correlation [38]. The WVC is likely related to aerosol swelling as a result of hygroscopic growth, which increases the scattering of the aerosol. Therefore, this paper analyzes the AOD deviation based on SSA and WVC. In order to obtain the SSA value at 550 nm, a linear interpolation of the SSAs at 440 nm and 675 nm was used.

The atmospheric aerosol conditions were divided into 11 SSA (550 nm) bins with different bin sizes (Figure 5). For strong aerosol absorption conditions (SSA < 0.88), the estimated AOD exhibited a higher correlation and smaller estimation uncertainties as compared with the MAIAC AOD (Table 6), and approximately 87.93% of the data samples fell within the EE envelope. The correlation of MAIAC AOD in terms of strong absorption was poor (R² = 0.53), but the estimation of AOD was able to improve in terms of accuracy (R² = 0.67), gradually improving the trend of negative deviation (Figure 5). When the aerosols gradually became less absorptive (0.88 ≤ SSA < 0.94), the MODIS AOD exhibited relatively small estimation uncertainties, with an R² = 0.76. The estimation accuracy was still marginally higher than that of the MODIS AOD (MAE = 0.04, RMSE = 0.07). For weakly absorptive or spheroid aerosol conditions (SSA ≥ 0.94), the estimation correlation was consistent with that of the MODIS AOD (R² = 0.87). As a result of the time difference between MODIS and Landsat-8, the AOD deviation may have been caused by the different SSAs at different times, and the stronger the absorption, the larger the deviation, which could be corrected by prior knowledge using the machine learning model developed in this paper. This indicates that, although the machine learning models do not explicitly obtain direct information about the possible systematic covariability of SSA, they seem to be able to detect it indirectly through Landsat and other information (e.g., WVC) inputs, at least to some extent. In general, the results estimated by machine learning herein can improve the estimation accuracy in the case of strong aerosols or aerosols with less absorption.

In the real atmosphere, WVC has implications on aerosol composition and size. An increase in WVC also increases the uptake of water into aerosol particles, thus affecting the aerosol SSA. Furthermore, the uncertainty of MAIAC AOD retrievals and the estimated AOD related to the true aerosol type was evaluated according to the AERONET WVC retrieved values (Figure 6). Figure 6b shows that MAIAC AOD was gradually underestimated with the increase in WVC content. This Figure illustrates the continuous increase in AOD deviation due to the time difference (MODIS and Landsat-8) as WVC increases. The machine learning method can reduce the tendency of underestimation. When the water vapor was less than 3, the mean value of aerosol deviation was almost zero. In this paper, the prior WVC was used to create a machine learning model that was able to automatically detect these changes and improve them. In summary, machine learning algorithms can, to some extent, automatically establish the WVC relationship at different moments, thus improving the accuracy of the AOD. By examining the error distributions as functions of NDVI and aerosol types, it can be seen that the developed ERF model is robust and works effectively under all these conditions.

4.4. Mapping the AOD over Beijing

We used multiple machine learning models to explore the relationship between the AOD values measured from AERONET sites and satellite observations, and compared their ability to estimate AOD. The results show that the ERF model has superior performance. Therefore, we used it to generate time-series AOD products over the Beijing region. This is one of the largest and most dynamic regions in northern China, and it contains many bright surfaces, urban areas comprising artificial buildings and sparse vegetation.

It is worth noting that the ERF model was reconstructed to ensure the authenticity of the comparison. In the training sample, all the information from the Beijing-area stations was removed, and the model was retrained using ERF to estimate the Landsat-8 AOD in the Beijing area. The superior performance of the ERF model built on Landsat-8 for the independent validation sample (Table 4) was also transferred to mapping the AOD of Landsat-7. Figure S2 shows the accuracy scatter plot of the Landsat-7 AOD estimated using the ERF model in the Beijing area; the lower root-mean-square error (0.143) shows the transferability of the model.

This section presents the spatial coverages of the estimated Landsat-8 AOD and the Landsat-7 AOD, and compares the aerosol loading among MODIS products (MCD19A2), which were closest to the Landsat transiting time. For accuracy verification, the ERF-estimated AOD values were compared with four AERONET sites over the Beijing area (Beijing, Beijing-CAMS, Beijing- RADI and XiangHe sites), and the results were cross-validated with the AODs retrieved by the LaSRC algorithm on the scale of the AERONET sites.

Figure 7 shows different AOD scenarios observed on days 345 in 2000, 247 in 2014, 61 in 2016, and 290 in 2018. The F-mask was utilized on the blue band of the Landsat7/8 and the estimated AODs [58] for the pixel removal of cloud, shadow, and water bodies for comparison. The results show less cloud coverage and represent low pollution (Figure 7a,b), moderate pollution (c), and heavy pollution (d), respectively, where the results in Figure 7a were generated with Landsat-7 data and the rest were generated with Landsat-8 data. The results show that high AOD values often occur in the southern and eastern parts of the city, which are within the urban area of Beijing and are characterized by high traffic flows, a dense population, and intense anthropogenic activity. In the northwestern part of the city, which comprises vegetated mountainous regions, the AOD values are always lower than in the urban areas. The resulting AOD spatial distributions were consistent with MCD19A2, and the AOD filling algorithm could obtain a more complete MAIAC AOD and apply it to the estimation of the Landsat satellite AOD.

In Figure 7a,b, which are characterized by low pollution, MODIS was deficient due to the influences of clouds and water, which affected its spatial continuity. Moreover, the resolution was too coarse and local details were not reflected. The shape of the parcel in the red box in Figure 7 could be well captured by the ERF AOD estimation method. On the other hand, MODIS displayed no local details, and only produced massive pixels. After filling the MODIS pixels, the coverage was greatly improved. In the case of moderate pollution in Figure 7c, the machine learning method exhibited better spatial continuity than in urban areas with many details, and thus it was able to capture the local pollution situation. The red box is the local amplification of the main city area of Beijing, and it contains rich local details which MODIS cannot reflect. MODIS produced numerous missing pixels which could be addressed using gap filling. However, as a result of the influence of cloud and snow, the AOD estimated in this paper also masked these areas, resulting in missing pixels. In the case of the heavy pollution in Figure 7d, local pollution events can be seen the lower right corner of the image, accompanied by the influence of clouds, indicating that MODIS produced a large number of missing pixels. This demonstrates the advantage of a high resolution. As shown in the red box in the figure, a 30 m AOD can capture the details of local pollution events while MODIS produces no numerical value for them. In summary, the proposed method can retrieve AOD over diverse surfaces with less pixel loss and a similar spatial distribution to MODIS. Moreover, it has a high spatial resolution, which can show more details.

To verify the accuracy of the AOD estimation in Beijing, in this study, we not only used AERONET AOD but also conducted a cross-comparison using 30 m LaSRC AOD data. The LaSRC AOD data had an R² of 0.807 and an RMSE of 0.162 (Figure 8). On the contrary, the R² of ERF AODs improved by 10%, reaching 0.889, and the RMSE was reduced to 0.156. The LaSRC retrieval method demonstrated an overall tendency to underestimate the AOD, while the ERF AOD estimates were marginally overestimated. We also compared the AOD between LaSRC and ERF on a time series, with a total of 75 matching data points at the AERONET-site scale (Figure 9). The LaSRC retrieval exhibited a high correlation (0.90) with the AOD estimated by the ERF model. They both exhibited similar AOD trends, but the LaSRC produced more underestimates than the ERF. In summary, the ERF exhibited less uncertainty than LaSRC AOD products, which shows the applicability of the machine learning model over Beijing city.

5. Conclusions

In this paper, the machine learning method was used to estimate the AOD at a 30 m resolution from Landsat-8 data. The machine learning model estimated the AOD by learning the relationship between AODs measured by the AERONET sites and satellite apparent reflectance information, the GLASS broadband albedo dataset, MAIAC AOD, and other auxiliary data. We then compared the ability of six machine learning methods (bagging, boosting, and linear regression) to fit the samples and estimate the AODs. The model that exhibited the best performance was used to estimate the AODs for the year 2019. Thereafter, we compared the AOD estimates with collocated AOD measurements from AERONET. In terms of model fitting and independent sample verification, the ERF model demonstrated the best accuracy. The ERF model was able to correct the accuracy error of MAIAC AOD measurements in sparse vegetation areas. The MODIS AOD was gradually underestimated with increases to WVC, but the ERF model was able to alleviate this trend. One of the main reasons for the superior performance of the ERF model was that it does not need to explicitly make assumptions about the optical aerosol properties of the atmosphere, because it seems to be able to indirectly account for the covariation of WVC and SSA. Finally, we mapped the AOD distributions for Beijing. The results show that the model can produce high-quality AOD maps and the accuracy was better than LaSRC AOD at a 30 m scale. The experimental results show that the proposed model was effective and generalizable. The ERF model can improve the resolution of MAIAC AOD products more than 30-fold while simultaneously improving their accuracy. In the future, we would like to introduce more high-resolution variables into the model to further improve the product’s quality and spatial distribution. This machine learning model has great potential for global Landsat-8 atmospheric corrections because it can quickly retrieve the 30 m resolution AOD. Furthermore, we can attempt to add physical model simulation data to the samples and introduce various deep learning methods for comparison. Spatial smoothing can also be considered in the future to reduce the uncertainty of the 30 m AOD.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/rs14051053/s1, Figure S1: Scatter density map of MCD19A2 AOD (1 km) and Landsat-8 overpass time AERONET AOD observations; Table S1: Land cover types for the Legend in Figure 1; Figure S2: Density scatterplots of the estimated AOD of Landsat-7 for the ERF models developed by Land-sat-8 over Beijing area between 2001 and 2012; Table S2: Summary of the data sources used in this study; Table S3: Statistics of selected algorithms; Table S4: Statistics of model fitting.

Author Contributions

Conceptualization, T.L. and S.L.; methodology, T.L. and L.Z.; software, T.L. and B.L.; validation, T.L. and H.L.; data curation, L.Z. and T.H.; writing—original draft preparation, T.L., L.Z. and L.S.; writing—review and editing, S.L., F.T. and L.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by National Key Research and Development Program of China (2020YFA0608704) and National Natural Science Foundation of China (Grant No. 42001299 and 42090012).

Acknowledgments

The authors thank GEE for their free remote sensing cloud processing platform and USGS for their free provision of the Landsat data. Thanks are due to AERONET for their data maintenance. Thanks to MODIS, GLASS and ERA5 for the free release of their products. We express our sincere gratitude to the anonymous reviewers and the editor for their constructive comments.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Description of Machine Learning Models

Random Forest (RF) is a highly flexible, tree-based machine learning algorithm, which uses multiple decision trees to make decisions. It is based on the bagging method, which can handle thousands of input variables without variable deletion to reduce the overfitting problem in decision trees and variance.

CasRF is based on the RF method. It connects multiple RF models in series and takes the residuals of the previous model as the input parameter to the next model. Through repeated training (cascade learning), the residuals can be corrected and the fitting and predicting ability of the model can be improved.

The Extremely Randomized Trees (ERF) method is very similar to RF. It consists of hundreds or thousands of decision trees that can be used to address regression and classification issues. Unlike RF, ERT does not resample observations when building a tree. It further strengthens the randomization of attribute selection and node splitting and can effectively reduce model variance.

Gradient Boosting Machine (GBM) is another technique for performing supervised machine learning tasks like RF. GBM is an ensemble learner, which means it will create a final model based on a collection of individual models. The predictive power of these individual models is weak, and they are prone to overfitting, but combining many such weak models in an ensemble will lead to an overall improved performance. In GBM, the most common type of weak model used is the decision tree—another parallel to RF. The implementations of this technique have different names, but the most common are XGBoost and GBDT. GBDT is a machine learning algorithm that iteratively constructs an ensemble of weak decision tree learners through boosting, in which is easy to specify different loss functions, making models relatively easy to interpret. XGBoost is a further improvement on GBDT, which uses more accurate approximations to find the best tree model. It employs several methods (i.e., computing second-order gradients and advanced regularization, which improves model generalization) that make it powerful and fast, particularly with structured data. Multiple Linear Regression (MLR) algorithms are typical linear regression algorithms, wherein a dependent variable is determined by multiple independent variables, which are then added to the comparison.

References

Charlson, R.J.; Schwartz, S.E.; Hales, J.M.; Cess, R.D.; Coakley, J.A., Jr.; Hansen, J.E.; Hofmann, D.J. Climate Forcing by Anthropogenic Aerosols. Science 1992, 255, 423–430. [Google Scholar] [CrossRef] [PubMed]
Sokolik, I.N.; Toon, O.B. Direct radiative forcing by anthropogenic airborne mineral aerosols. Nature 1996, 381, 681–683. [Google Scholar] [CrossRef]
Pöschl, U. Atmospheric Aerosols: Composition, Transformation, Climate and Health Effects. Angew. Chem. Int. Ed. 2005, 44, 7520–7540. [Google Scholar] [CrossRef] [PubMed]
Andreae, M.; Rosenfeld, D. Aerosol–cloud–precipitation interactions. Part 1. The nature and sources of cloud-active aerosols. Earth-Sci. Rev. 2008, 89, 13–41. [Google Scholar] [CrossRef]
Shao, L.; Li, G.; Zhao, Q.; Li, Y.; Sun, Y.; Wang, W.; Cai, C.; Chen, W.; Liu, R.; Luo, W.; et al. The fertilization effect of global dimming on crop yields is not attributed to an improved light interception. Glob. Chang. Biol. 2020, 26, 1697–1713. [Google Scholar] [CrossRef] [PubMed]
Wang, X.; Wu, J.; Chen, M.; Xu, X.; Wang, Z.; Wang, B.; Wang, C.; Piao, S.; Lin, W.; Miao, G.; et al. Field evidences for the positive effects of aerosols on tree growth. Glob. Chang. Biol. 2018, 24, 4983–4992. [Google Scholar] [CrossRef] [PubMed]
Zhou, H.; Yue, X.; Lei, Y.; Tian, C.; Ma, Y.; Cao, Y. Aerosol radiative and climatic effects on ecosystem productivity and evapotranspiration. Curr. Opin. Environ. Sci. Health 2021, 19, 100218. [Google Scholar] [CrossRef]
Xie, X.; Wang, T.; Yue, X.; Li, S.; Zhuang, B.; Wang, M. Effects of atmospheric aerosols on terrestrial carbon fluxes and CO₂ concentrations in China. Atmos. Res. 2020, 237, 104859. [Google Scholar] [CrossRef]
Holben, B.N.; Eck, T.F.; Slutsker, I.; Tanré, D.; Buis, J.P.; Setzer, A.; Vermote, E.; Reagan, J.A.; Kaufman, Y.J.; Nakajima, T.; et al. AERONET—A Federated Instrument Network and Data Archive for Aerosol Characterization. Remote Sens. Environ. 1998, 66, 1–16. [Google Scholar] [CrossRef]
Levy, R.; Mattoo, S.; Munchak, L.; Remer, L.; Sayer, A.; Patadia, F.; Hsu, N. The Collection 6 MODIS aerosol products over land and ocean. Atmos. Meas. Tech. 2013, 6, 2989. [Google Scholar] [CrossRef] [Green Version]
Gupta, P.; Remer, L.A.; Levy, R.C.; Mattoo, S. Validation of MODIS 3 km land aerosol optical depth from NASA’s EOS Terra and Aqua missions. Atmos. Meas. Tech. 2018, 11, 3145–3159. [Google Scholar] [CrossRef] [Green Version]
Lyapustin, A.; Wang, Y.; Korkin, S.; Huang, D. MODIS Collection 6 MAIAC algorithm. Atmos. Meas. Tech. 2018, 11, 5741–5765. [Google Scholar] [CrossRef] [Green Version]
Mei, L.; Rozanov, V.; Vountas, M.; Burrows, J.P.; Levy, R.C.; Lotz, W. Retrieval of aerosol optical properties using MERIS observations: Algorithm and some first results. Remote Sens. Environ. 2016, 197, 125–140. [Google Scholar] [CrossRef]
Jackson, J.M.; Liu, H.; Laszlo, I.; Kondragunta, S.; Remer, L.A.; Huang, J.; Huang, H.-C. Suomi-NPP VIIRS aerosol algorithms and data products. J. Geophys. Res. Atmos. 2013, 118, 12673–12689. [Google Scholar] [CrossRef]
Yoshida, M.; Kikuchi, M.; Nagao, T.M.; Murakami, H.; Nomaki, T.; Higurashi, A. Common retrieval of aerosol properties for imaging satellite sensors. J. Meteorol. Soc. Jpn. 2018, 96, 193–209. [Google Scholar] [CrossRef] [Green Version]
Diner, D.; Beckert, J.; Reilly, T.; Bruegge, C.; Conel, J.; Kahn, R.; Martonchik, J.; Ackerman, T.; Davies, R.; Gerstl, S.; et al. Multi-angle Imaging SpectroRadiometer (MISR) instrument description and experiment overview. IEEE Trans. Geosci. Remote Sens. 1998, 36, 1072–1087. [Google Scholar] [CrossRef]
Garay, M.J.; Kalashnikova, O.V.; Bull, M.A. Development and assessment of a higher-spatial-resolution (4.4 km) MISR aerosol optical depth product using AERONET-DRAGON data. Atmos. Chem. Phys. 2017, 17, 5095–5106. [Google Scholar] [CrossRef] [Green Version]
Leroy, M.; Deuzé, J.; Bréon, F.; Hautecoeur, O.; Herman, M.; Buriez, J.; Tanré, D.; Bouffies, S.; Chazette, P.; Roujean, J. Re-trieval of atmospheric properties and surface bidirectional reflectances over land from POLDER/ADEOS. J. Geo-phys. Res. Atmos. 1997, 102, 17023–17037. [Google Scholar] [CrossRef]
Winker, D.M.; Vaughan, M.A.; Omar, A.; Hu, Y.; Powell, K.A.; Liu, Z.; Hunt, W.H.; Young, S. Overview of the CALIPSO Mission and CALIOP Data Processing Algorithms. J. Atmos. Ocean. Technol. 2009, 26, 2310–2323. [Google Scholar] [CrossRef]
Sun, L.; Wei, J.; Bilal, M.; Tian, X.; Jia, C.; Guo, Y.; Mi, X. Aerosol Optical Depth Retrieval over Bright Areas Using Landsat 8 OLI Images. Remote. Sens. 2015, 8, 23. [Google Scholar] [CrossRef] [Green Version]
Tian, X.; Liu, Q.; Song, Z.; Dou, B.; Li, X. Aerosol Optical Depth Retrieval From Landsat 8 OLI Images Over Urban Areas Supported by MODIS BRDF/Albedo Data. IEEE Geosci. Remote. Sens. Lett. 2018, 15, 976–980. [Google Scholar] [CrossRef]
Chen, X.; Ding, J.; Wang, J.; Ge, X.; Raxidin, M.; Liang, J.; Chen, X.; Zhang, Z.; Cao, X.; Ding, Y. Retrieval of Fine-Resolution Aerosol Optical Depth (AOD) in Semiarid Urban Areas Using Landsat Data: A Case Study in Urumqi, NW China. Remote Sens. 2020, 12, 467. [Google Scholar] [CrossRef] [Green Version]
Bilal, M.; Qiu, Z. Aerosol Retrievals Over Bright Urban Surfaces Using Landsat 8 Images. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 7560–7563. [Google Scholar]
Luo, N.; Wong, M.S.; Zhao, W.; Yan, X.; Xiao, F. Improved aerosol retrieval algorithm using Landsat images and its application for PM10 monitoring over urban areas. Atmos. Res. 2015, 153, 264–275. [Google Scholar] [CrossRef]
Liang, S.; Fang, H.; Chen, M. Atmospheric correction of Landsat ETM+ land surface imagery. I. Methods. IEEE Trans. Geosci. Remote Sens. 2001, 39, 2490–2498. [Google Scholar] [CrossRef]
Kaufman, Y.J.; Tanré, D.; Remer, L.A.; Vermote, E.F.; Chu, A.; Holben, B.N. Operational remote sensing of tropospheric aerosol over land from EOS moderate resolution imaging spectroradiometer. J. Geophys. Res. Space Phys. 1997, 102, 17051–17067. [Google Scholar] [CrossRef]
Zhong, B.; Wu, S.; Yang, A.; Liu, Q. An improved aerosol optical depth retrieval algorithm for moderate to high spatial reso-lution optical remotely sensed imagery. Remote Sens. 2017, 9, 555. [Google Scholar] [CrossRef] [Green Version]
Vermote, E.; Justice, C.; Claverie, M.; Franch, B. Preliminary analysis of the performance of the Landsat 8/OLI land surface reflectance product. Remote Sens. Environ. 2016, 185, 46–56. [Google Scholar] [CrossRef]
Vermote, E.; Tanré, D.; Deuzé, J.; Herman, M.; Morcrette, J.-J. Second simulation of a satellite signal in the solar spectrum-vector (6SV). 6S User Guide Version. 2006, 3, 1–55. [Google Scholar]
Levy, R.C.; Remer, L.A.; Mattoo, S.; Vermote, E.F.; Kaufman, Y.J. Second-generation operational algorithm: Retrieval of aerosol properties over land from inversion of Moderate Resolution Imaging Spectroradiometer spectral reflectance. J. Geophys. Res. Space Phys. 2007, 112. [Google Scholar] [CrossRef] [Green Version]
Hsu, N.C.; Jeong, M.-J.; Bettenhausen, C.; Sayer, A.M.; Hansell, R.; Seftor, C.S.; Huang, J.; Tsay, S.-C. Enhanced Deep Blue aerosol retrieval algorithm: The second generation. J. Geophys. Res. Atmos. 2013, 118, 9296–9315. [Google Scholar] [CrossRef]
Wei, J.; Li, Z.; Peng, Y.; Sun, L. MODIS Collection 6.1 aerosol optical depth products over land and ocean: Validation and comparison. Atmos. Environ. 2019, 201, 428–440. [Google Scholar] [CrossRef]
Wei, J.; Li, Z.; Sun, L.; Peng, Y.; Liu, L.; He, L.; Qin, W.; Cribb, M. MODIS Collection 6.1 3 km resolution aerosol optical depth product: Global evaluation and uncertainty analysis. Atmos. Environ. 2020, 240, 117768. [Google Scholar] [CrossRef]
Bilal, M.; Nichol, J.E. Evaluation of MODIS aerosol retrieval algorithms over the Beijing-Tianjin-Hebei region during low to very high pollution events. J. Geophys. Res. Atmos. 2015, 120, 7941–7957. [Google Scholar] [CrossRef] [Green Version]
Wang, Q.; Sun, L.; Wei, J.; Yang, Y.; Li, R.; Liu, Q.; Chen, L. Validation and Accuracy Analysis of Global MODIS Aerosol Products over Land. Atmosphere 2017, 8, 155. [Google Scholar] [CrossRef] [Green Version]
Wei, J.; Huang, W.; Li, Z.; Xue, W.; Peng, Y.; Sun, L.; Cribb, M. Estimating 1-km-resolution PM2.5 concentrations across China using the space-time random forest approach. Remote Sens. Environ. 2019, 231, 111221. [Google Scholar] [CrossRef]
Liang, T.; Sun, L.; Li, H. MODIS aerosol optical depth retrieval based on random forest approach. Remote Sens. Lett. 2021, 12, 179–189. [Google Scholar] [CrossRef]
Huttunen, J.; Kokkola, H.; Mielonen, T.; Mononen, M.E.J.; Lipponen, A.; Reunanen, J.; Lindfors, A.V.; Mikkonen, S.; Lehtinen, K.E.J.; Kouremeti, N.; et al. Retrieval of aerosol optical depth from surface solar radiation measurements using machine learning algorithms, non-linear regression and a radiative transfer-based look-up table. Atmos. Chem. Phys. 2016, 16, 8181–8191. [Google Scholar] [CrossRef] [Green Version]
Wei, J.; Li, Z.; Xue, W.; Sun, L.; Fan, T.; Liu, L.; Su, T.; Cribb, M. The ChinaHighPM10 dataset: Generation, validation, and spatiotemporal variations from 2015 to 2019 across China. Environ. Int. 2021, 146, 106290. [Google Scholar] [CrossRef]
Radosavljevic, V.; Vucetic, S.; Obradovic, Z. Aerosol optical depth retrieval by neural networks ensemble with adaptive cost function. In Proceedings of the 10th International Conference on Engineering Applications of Neural Networks, Thessaloniki, Greece, 29–31 August 2007; pp. 266–275. [Google Scholar]
Ristovski, K.; Vucetic, S.; Obradovic, Z. Uncertainty Analysis of Neural-Network-Based Aerosol Retrieval. IEEE Trans. Geosci. Remote Sens. 2012, 50, 409–414. [Google Scholar] [CrossRef]
Mauceri, S.; Kindel, B.; Massie, S.; Pilewskie, P. Neural network for aerosol retrieval from hyperspectral imagery. Atmos. Meas. Tech. 2019, 12, 6017–6036. [Google Scholar] [CrossRef] [Green Version]
Roy, D.; Kovalskyy, V.; Zhang, H.; Vermote, E.; Yan, L.; Kumar, S.S.; Egorov, A. Characterization of Landsat-7 to Landsat-8 reflective wavelength and normalized difference vegetation index continuity. Remote Sens. Environ. 2016, 185, 57–70. [Google Scholar] [CrossRef] [Green Version]
Giles, D.M.; Sinyuk, A.; Sorokin, M.G.; Schafer, J.S.; Smirnov, A.; Slutsker, I.; Eck, T.F.; Holben, B.N.; Lewis, J.R.; Campbell, J.; et al. Advancements in the Aerosol Robotic Network (AERONET) Version 3 database–automated near-real-time qual-ity control algorithm with improved cloud screening for Sun photometer aerosol optical depth (AOD) measurements. Atmos. Meas. Tech. 2019, 12, 169–209. [Google Scholar] [CrossRef] [Green Version]
Ångström, A. The parameters of atmospheric turbidity. Tellus 1964, 16, 64–75. [Google Scholar] [CrossRef] [Green Version]
Qu, Y.; Liu, Q.; Liang, S.; Wang, L.; Liu, N.; Liu, S. Direct-Estimation Algorithm for Mapping Daily Land-Surface Broadband Albedo From MODIS Data. IEEE Trans. Geosci. Remote Sens. 2014, 52, 907–919. [Google Scholar] [CrossRef]
Liu, N.F.; Liu, Q.; Wang, L.Z.; Liang, S.L.; Wen, J.G.; Qu, Y.; Liu, S.H. A statistics-based temporal filter algorithm to map spatiotemporally continuous shortwave albedo from MODIS data. Hydrol. Earth Syst. Sci. 2013, 17, 2121–2129. [Google Scholar] [CrossRef] [Green Version]
Liu, Q.; Wang, L.; Qu, Y.; Liu, N.; Liu, S.; Tang, H.; Liang, S. Preliminary evaluation of the long-term GLASS albedo product. Int. J. Digit. Earth 2013, 6, 69–95. [Google Scholar] [CrossRef]
Liang, S.; Zhao, X.; Liu, S.; Yuan, W.; Cheng, X.; Xiao, Z.; Zhang, X.; Liu, Q.; Cheng, J.; Tang, H.; et al. A long-term Global LAnd Surface Satellite (GLASS) data-set for environmental studies. Int. J. Digit. Earth 2013, 6, 5–33. [Google Scholar] [CrossRef]
Liang, S.; Cheng, J.; Jia, K.; Jiang, B.; Liu, Q.; Xiao, Z.; Yao, Y.; Yuan, W.; Zhang, X.; Zhao, X.; et al. The Global Land Surface Satellite (GLASS) Product Suite. Bull. Am. Meteorol. Soc. 2021, 102, E323–E337. [Google Scholar] [CrossRef]
Lyapustin, A.; Wang, Y.; Laszlo, I.; Kahn, R.; Korkin, S.; Remer, L.; Levy, R.; Reid, J.S. Multiangle implementation of atmospheric correction (MAIAC): 2. Aerosol algorithm. J. Geophys. Res. Earth Surf. 2011, 116. [Google Scholar] [CrossRef]
Superczynski, S.D.; Kondragunta, S.; Lyapustin, A.I. Evaluation of the Multi-Angle Implementation of Atmospheric Correction (MAIAC) Aerosol Algorithm through Intercomparison with VIIRS Aerosol Products and AERONET. J. Geophys. Res. Atmos. 2017, 122, 3005–3022. [Google Scholar] [CrossRef] [Green Version]
Mhawish, A.; Banerjee, T.; Sorek-Hamer, M.; Lyapustin, A.; Broday, D.M.; Chatfield, R. Comparison and evaluation of MODIS Multi-angle Implementation of Atmospheric Correction (MAIAC) aerosol product over South Asia. Remote Sens. Environ. 2019, 224, 12–28. [Google Scholar] [CrossRef]
Liu, N.; Zou, B.; Feng, H.; Tang, Y.; Liang, Y. Evaluation and comparison of MAIAC, DT and DB aerosol products over China. Atmos. Chem. Phys. Discuss. 2019, 1–34. [Google Scholar]
Wang, Y.; Yuan, Q.; Li, T.; Shen, H.; Zheng, L.; Zhang, L. Large-scale MODIS AOD products recovery: Spatial-temporal hybrid fusion considering aerosol variation mitigation. ISPRS J. Photogramm. Remote Sens. 2019, 157, 1–12. [Google Scholar] [CrossRef]
NASA JPL. NASADEM Merged DEM Global 1 Arc Second V001. Distributed by OpenTopography. 2021. Available online: https://portal.opentopography.org/datasetMetadata?otCollectionID=OT.032021.4326.2 (accessed on 25 November 2021).
Wei, J.; Li, Z.; Peng, Y.; Sun, L.; Yan, X. A Regionally Robust High-Spatial-Resolution Aerosol Retrieval Algorithm for MODIS Images Over Eastern China. IEEE Trans. Geosci. Remote Sens. 2019, 57, 4748–4757. [Google Scholar] [CrossRef]
Zhu, Z.; Wang, S.; Woodcock, C.E. Improvement and expansion of the Fmask algorithm: Cloud, cloud shadow, and snow detection for Landsats 4–7, 8, and Sentinel 2 images. Remote Sens. Environ. 2015, 159, 269–277. [Google Scholar] [CrossRef]

Figure 1. Locations of the global Aerosol Robotic Network (purple dots) monitoring stations that were used for our model development. Land use cover types in 2016 (background colored shading) are marked based on the MODIS land use cover product at a 500 m spatial resolution. Descriptions of the land use classes in the legend are given in Table S1.

Figure 2. Flowchart of the methodology applied in this study.

Figure 3. Density scatterplots of the estimated results for the different machine learning models (N = 1133). Color scale indicates the density of observation. The red dashed line and the black and green solid lines represent the MODIS expected error, the 1:1 line and the linear regression line, respectively.

Figure 4. Box plots of AOD bias (in grey) for ERF-estimated AOD (a) and MODIS AODs (b), against AERONET AODs, as a function of NDVI. In each box, the green triangle and the middle, lower, and upper horizontal lines represent the AOD bias mean, median, and 25th and 75th percentiles, respectively.

Figure 5. The same as Figure 4 but as a function of AERONET single scattering albedo (SSA, 550 nm) values. Box plots of AOD bias (in grey) for ERF-estimated AOD (a) and MODIS AODs (b).

Figure 6. Same as Figure 4, but as a function of AERONET water vapor values. Box plots of AOD bias (in grey) for ERF-estimated AOD (a) and MODIS AODs (b).

Figure 7. Spatial distribution of the Landsat-7 (a) and Landsat-8 (b–d) blue band atmosphere datasets, the new (30 m) AOD datasets from this study, the MAIAC datasets (1 km), and enlarged views of Beijing of different times.

Figure 8. Validation of the 30 m AOD from the ERF model (blue) and the LaSRC algorithm (red) against the AERONET AOD measurements over Beijing. The blue and red solid lines represent the linear regression lines; the blue and red shaded areas represent the confidence interval of regressions. The orange dashed line and the black solid lines represent the MODIS expected error and the 1:1 line, respectively.

Figure 9. Comparison between the 30 m AOD derived from LaSRC (orange) with the ERF-estimated AOD (blue) in 2013–2019 on the AERONET-site scale (Beijing, Beijing-CAMS, Beijing-RADI, and XiangHe sites). The grey dots represent LaSRC minus ERF AODs. The x-axis represents the time of the points and the y-axis represents the AOD.

Table 1. Current remote sensing-based aerosol optical depth (AOD) instruments grouped by the type of satellite sensors used.

Type of Sensor	Sensor	Product Name	Algorithm	Spatial Resolution (km)	Temporal Resolution	Reference
Multispectral	MODIS	MOD04-3k	DT	3 km	Daily	[10]
		MOD04-10k	DB/DT	10 km	Daily	[11]
		MCD19	MAIAC	1 km	Daily	[12]
	MERIS	XBAER	Unnamed DT-like algorithm	10 km	Daily	[13]
	VIIRS	EDR	MODIS-like atmospheric correction algorithm	6 km	Daily	[14]
	Himawari-8	AHI	Common algorithm	5 km	10 min	[15]
Multi-angle	MISR	V22_17.5km	Version 22 retrieval algorithm	17.5 km	Daily	[16]
Multi-angle	MISR	V22_4.4km	Version 23 retrieval algorithm	4.4 km	Daily	[17]
Polarization	POLDER	PARASOL	Polarization retrieval algorithm	18.5 km	Daily	[18]
Lidar	CALIOP	CALIPSO	Active lidar sensor algorithm	$2 ° \times$ 5°	Daily	[19]

Table 2. Detailed parameter comparison of the OLI and ETM+ bands.

Landsat–8 OLI			Landsat–7 ETM+
Band	Spectral Range (μm)	Resolution (m)	Band	Spectral Range (μm)	Resolution (m)
B1 Coastal	0.43–0.45	30	–	–	–
B2 Blue	0.45–0.51	30	B1 Blue	0.45–0.52	30
B3 Green	0.53–059	30	B2 Green	0.52–0.60	30
B4 Red	0.64–0.67	30	B3 Red	0.63–0.69	30
B5 NIR	0.85–0.88	30	B4 NIR	0.77–0.90	30
B6 SWIR 1	1.57–1.65	30	B5 SWIR 1	1.55–1.75	30
B7 SWIR 2	2.11–2.29	30	B7 SWIR 2	2.09–2.35	30

Table 3. Sample-based cross-validation results for the different machine learning models.

Methods	Name	$R^{2}$	RMSE	MAE	Within EE
Bagging	Random Forest (RF)	0.776	0.090	0.048	85.21
	Extremely randomized trees (ERF)	0.780	0.090	0.047	85.23
	Cascade Random Forest (CasRF)	0.777	0.089	0.048	85.09
Boosting	Gradient Boosted Decision Trees (GBDT)	0.771	0.089	0.048	85.11
Boosting	Extreme Gradient Boosting (XGBoost)	0.774	0.090	0.048	85.21
Linear	Multiple Linear Regression (MLR)	0.731	0.098	0.053	81.79

Table 4. Independent validation results for the different machine learning models.

Methods	Name	$R^{2}$	RMSE	MAE	Within EE
Bagging	Random Forest (RF)	0.764	0.117	0.066	74.72
	Extremely randomized trees (ERF)	0.770	0.116	0.066	74.88
	Cascade Random Forest (CasRF)	0.763	0.118	0.066	74.61
Boosting	Gradient Boosted Decision Trees (GBDT)	0.769	0.116	0.065	74.74
Boosting	Extreme Gradient Boosting (XGBoost)	0.762	0.117	0.066	75.56
Linear	Multiple Linear Regression (MLR)	0.760	0.117	0.066	71.27

Table 5. Statistics concerning the estimation results of seven combinations of inputs.

ERF Model	Predictive Power
ERF Model	$R^{2}$	RMSE	MAE	Within EE
f1 (TOA_2~7, Angle)	0.313	0.120	0.073	67.52
f2 (TOA_2~7, Angle, EL)	0.338	0.117	0.071	67.71
f3 (TOA_2~7, Angle, EL, O₃)	0.368	0.116	0.070	69.29
f4 (TOA_2~7, Angle, EL, O₃, WVC)	0.456	0.108	0.065	70.96
f5 (TOA_2~7, Angle, EL, O₃, WVC, GLASS_Albedo)	0.545	0.091	0.045	76.03
f6 (TOA_2~7, Angle, EL, O₃ WVC, GLASS_Albedo, MAIAC_AOD	0.791	0.067	0.042	87.82
f7 (EL, O₃ WVC, GLASS_Albedo, MAIAC_AOD	0.748	0.073	0.045	83.94

Table 6. The accuracy of the observed (AERONET) AOD, MAIAC AOD and estimated AOD by the ERF model for statistically different SSA bins.

SSA Bins	AOD	$R^{2}$	RMSE	MAE	Within in EE%
<0.88	MODIS AOD	0.532	0.073	0.046	82.33
<0.88	ERF AOD	0.666	0.061	0.035	87.93
0.88 ≤ SSA < 0.94	MODIS AOD	0.761	0.086	0.055	82.66
0.88 ≤ SSA < 0.94	ERF AOD	0.848	0.071	0.045	87.15
SSA ≥ 0.94	MODIS AOD	0.633	0.059	0.042	85.60
SSA ≥ 0.94	ERF AOD	0.675	0.056	0.038	87.73

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liang, T.; Liang, S.; Zou, L.; Sun, L.; Li, B.; Lin, H.; He, T.; Tian, F. Estimation of Aerosol Optical Depth at 30 m Resolution Using Landsat Imagery and Machine Learning. Remote Sens. 2022, 14, 1053. https://doi.org/10.3390/rs14051053

AMA Style

Liang T, Liang S, Zou L, Sun L, Li B, Lin H, He T, Tian F. Estimation of Aerosol Optical Depth at 30 m Resolution Using Landsat Imagery and Machine Learning. Remote Sensing. 2022; 14(5):1053. https://doi.org/10.3390/rs14051053

Chicago/Turabian Style

Liang, Tianchen, Shunlin Liang, Linqing Zou, Lin Sun, Bing Li, Hao Lin, Tao He, and Feng Tian. 2022. "Estimation of Aerosol Optical Depth at 30 m Resolution Using Landsat Imagery and Machine Learning" Remote Sensing 14, no. 5: 1053. https://doi.org/10.3390/rs14051053

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Estimation of Aerosol Optical Depth at 30 m Resolution Using Landsat Imagery and Machine Learning

Abstract

1. Introduction

2. Data Sources and Processing

2.1. Satellite Data

2.2. AOD Ground Measurements

2.3. GLASS Albedo Product

2.4. MAIAC AOD Product

2.5. Auxiliary Data

2.6. Data Pre-Processing

3. Methodology

3.1. Model Development

3.2. Evaluation Approaches

4. Results and Discussion

4.1. Models Fitting and Validation

4.2. Estimating Landsat-8 AOD

4.3. Importance of Using Prior Knowledge

4.4. Mapping the AOD over Beijing

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A. Description of Machine Learning Models

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI