Physical-Based Spatial-Spectral Deep Fusion Network for Chlorophyll-a Estimation Using MODIS and Sentinel-2 MSI Data

He, Yuting; Wu, Penghai; Ma, Xiaoshuang; Wang, Jie; Wu, Yanlan

doi:10.3390/rs14225828

Open AccessArticle

Physical-Based Spatial-Spectral Deep Fusion Network for Chlorophyll-a Estimation Using MODIS and Sentinel-2 MSI Data

by

Yuting He

¹,

Penghai Wu

^1,2,3,*

,

Xiaoshuang Ma

^1,2

,

Jie Wang

^1,3

and

Yanlan Wu

^2,4

¹

School of Resources and Environmental Engineering, Anhui University, Hefei 230601, China

²

Information Materials and Intelligent Sensing Laboratory of Anhui Province, Anhui University, Hefei 230601, China

³

Anhui Province Key Laboratory of Wetland Ecosystem Protection and Restoration, Anhui University, Hefei 230601, China

⁴

Engineering Center for Geographic Information of Anhui Province, Anhui University, Hefei 230601, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2022, 14(22), 5828; https://doi.org/10.3390/rs14225828

Submission received: 11 October 2022 / Revised: 12 November 2022 / Accepted: 15 November 2022 / Published: 17 November 2022

(This article belongs to the Special Issue Deep Learning in Optical Satellite Images)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Satellite-derived Chlorophyll-a (Chl-a) is an important environmental evaluation indicator for monitoring water environments. However, the available satellite images either have a coarse spatial or low spectral resolution, which restricts the applicability of Chl-a retrieval in coastal water (e.g., less than 1 km from the shoreline) for large- and medium-sized lakes/oceans. Considering Lake Chaohu as the study area, this paper proposes a physical-based spatial-spectral deep fusion network (PSSDFN) for Chl-a retrieval using Moderate Resolution Imaging Spectroradiometer (MODIS) and Sentinel-2 Multispectral Instrument (MSI) reflectance data. The PSSDFN combines residual connectivity and attention mechanisms to extract effective features, and introduces physical constraints, including spectral response functions and the physical degradation model, to reconcile spatial and spectral information. The fused and MSI data were used as input variables for collaborative retrieval, while only the MSI data were used as input variables for MSI retrieval. Combined with the Chl-a field data, a comparison between MSI and collaborative retrieval was conducted using four machine learning models. The results showed that collaborative retrieval can greatly improve the accuracy compared with MSI retrieval. This research illustrates that the PSSDFN can improve the estimated accuracy of Chl-a for coastal water (less than 1 km from the shoreline) in large- and medium-sized lakes/oceans.

Keywords:

remote sensing; Chl-a retrieval; spatial-spectral deep fusion; physical constraints; machine learning

Graphical Abstract

1. Introduction

The concentration of Chlorophyll-a (Chl-a) is typically used to reveal the eutrophication status of aquatic ecosystems [1]. Traditional water quality monitoring is limited by time, cost, and areas [2,3]. Fortunately, satellite remote sensing technology can enhance monitoring capabilities to improve Chl-a monitoring programs [4,5,6].

For inland waters in close proximity to people, there are no specifically designed satellite sensors [7]. For many inland lakes, Chl-a has been estimated using ocean satellite color sensors with rich spectral bands, including the Sea-viewing Wide Field-of-view Sensor (SeaWiFS) [8], Moderate Resolution Imaging Spectroradiometer (MODIS) [9], Medium Resolution Imaging Spectrometer (MERIS) [10], and Sentinel-3 Ocean Land Color Instrument (OLCI) [11]. Although these rich spectral bands can provide global Chl-a information, the coarse spatial resolution restricts their application to coastal water [12]. In recent decades, land satellite sensors, such as the Sentinel-2 A/B Multispectral Instrument (MSI) [13] and Landsat series [14], have been used for estimating Chl-a due to their higher spatial resolution and better radiometric properties. However, these sensors tend to have a lower spectral resolution with wide bandwidths [15], which diminish spectral features and reduce the ability to discriminate water substances.

The Inability of a single sensor to consider both the spatial and spectral resolution makes it difficult to accurately estimate Chl-a in coastal water (e.g., less than 1 km from the shoreline) for large- and medium-sized inland lakes or oceans. As an alternative solution, recent studies have introduced spatial downscaling algorithms [16], which downscale low spatial (LS) resolution Chl-a data using high spatial (HS) resolution reflectance data. Guo et al. downscaled MODIS Chl-a products from a 1 km spatial resolution to 30 m using Landsat data [17], which improved the understanding of the dynamic patterns of Chl-a. However, such methods underutilize the spectral information of HS resolution data.

Another solution is to develop a spatial-spectral fusion (SSF) algorithm, which fully exploits the texture and spectral information of different spatial resolution data. The early SSF methods (called traditional SSF methods) adopt the related mathematical transformation to manually analyze the activity level and design fusion rules in the spatial or transformation domains. Traditional SSF methods were driven by related theories, including component substitution, multi-resolution analysis, sparse representation, and variational models [18]. However, the poor capacity of features’ extraction and rough fusion strategies are the two main limitations in most traditional fusion methods [19]. DL (deep learning) is a new milestone for the study of SSF. Automatic feature learning and end-to-end architectures show promising performances, which promotes tremendous progress in SSF, obtaining performances far exceeding traditional methods [20]. Some deep architectures were adopted in DL-based SSF, such as autoencoder (AE), conventional convolution neural network (CNN), and generative adversarial network (GAN). DL-based SSF methods can effectively integrate spatial-spectral information from diverse sources to provide more useful input for downstream tasks, to improve the performance of these applications [21,22]. Typical applications are photography visualization, remote sensing object tracking, remote sensing classification, and remote sensing monitoring, which can intuitively demonstrate the importance of SSF. However, the data-driven DL lacks physical significance, which greatly limits quantitative applications [23]. To generate high spatial-spectral resolution images for Chl-a retrieval, a novel SSF method is required.

As far as we know, this is the first study of physics-based SSF algorithms for Chl-a retrieval. Although some scholars proposed physics-based SSF algorithms, these were not focused on retrieving Chl-a [24]. In this paper, using MODIS and MSI data from Lake Chaohu, we developed a physical-based spatial-spectral deep fusion network (PSSDFN) for Chl-a retrieval. The main aims were as follows:

(1): Introduce physical constraints, including spectral response functions (SRFs) and the physical degradation model, and construct a new loss function to reconcile spatial and spectral information for avoiding spectral distortion during fusion.
(2): Design a deep fusion network to fuse MODIS and MSI data for Chl-a retrieval. This combines residual connectivity and attention mechanisms to extract more effective features, and creates a high spatial-spectral data source suitable for estimating Chl-a.
(3): Compare Chl-a concentrations retrieved from both the fused and MSI reflectance data (i.e., collaborative retrieval) and only the MSI reflectance data (i.e., MSI retrieval) using four machine learning (ML) models.

2. Study Area and Materials

2.1. Study Area and In Situ Observations

Lake Chaohu (area of ~780 km²) is a eutrophic lake located in central Anhui Province, downstream of the Yangtze River (31°25′–31°43′N, 117°16′–117°51′E; Figure 1a). It has a basin area of 13,486 km² and connects Hefei City and Chaohu City in the northwest and the east orientations. In recent decades, eutrophication and outbreaks of cyanobacterial blooms have posed a serious threat to drinking-water safety [25].

Field water samples (1.5 L, depth: 0.5 m) were collected from Lake Chaohu in August 2018 (red points, Figure 1a), December 2019 (yellow points, Figure 1a), June 2020, and November 2020 (blue points, Figure 1a). Data collection was timed to almost coincide with MSI and MODIS overpass at Lake Chaohu. Considering the MODIS data, to avoid that the follow-up fused “MSI-like” image may be affected by the adjacency effect or the bottom, especially for boundary pixels, the sampling points were located over 2 km from the shoreline. The samples were processed within 24 h and under light-proof conditions.

A total of 120 valid Chl-a samples were analyzed by spectrophotometric methods [26]. The absorbance of the extracts was determined at 630 (A₆₃₀), 647 (A₆₄₇), 664 (A₆₆₄), and 750 nm (A₇₅₀). The Chl-a concentration can be calculated according to HJ 897–2017 [27], which may be formularized as Equation (1). Figure 1b shows the information regarding the Chl-a concentration.

Chl-a = ((C₁ × (A₆₆₄ − A₇₅₀)) + (C₂ × (A₆₄₇ − A₇₅₀)) + (C₃ × (A₆₃₀ − A₇₅₀))) × V_e/V_f × L_c

(1)

where, C₁–C₃ are constants, V_e is the constant volume of the specimen, V_f is the sampling volume, and L_c is the path length of the cuvette in centimeters. Here, C₁ = 11.85, C₂ = −1.54, and C₃ = −0.08.

2.2. Satellite Data Source and Pre-Processing

This study used satellite data from MSI and MODIS, which were obtained from the official websites of ESA (https://scihub.copernicus.eu/dhus/#/home (accessed on 16 January 2021)) and NASA (https://ladsweb.modaps.eosdis.nasa.gov/search/ (accessed on 16 January 2021)). MSI has 13 spectral bands with 10, 20, and 60 m ground resolutions. Two MODIS products were used: MOD09 with 16 spectral bands, and MOD02 1 km with 36 spectral bands. We downloaded 14 useful MODIS–MSI image pairs covering the study area from 2018 to 2020. Ten MODIS–MSI image pairs were used as the training datasets, the rest were test datasets (Table 1). Due to the limitations of the revisit cycle and cloud cover, the acquisition times of two MODIS–MSI image pairs (i.e., test datasets in bold) were not perfectly matched with that of the in situ observations. Therefore, some images from adjacent times were also collected, which were used to discuss the retrieval uncertainty caused by this time-mismatch. Wavelengths from the visible to near-infrared range have proven validity in reflecting water quality parameters [28]. Figure 2 shows the distribution of MODIS and MSI bands with 0.4–0.9 μm wavelengths sensitive to Chl-a. The band range of MODIS is narrower than that of MSI, and a few MSI bands cannot be found in the MODIS data, such as MSI band 5 (B5) and MSI B7. In addition, considering that the pixels covering the region suffered saturation in MOD09 B13–B16, they were not included in this study.

For MSI data, the official Sen2Cor tool (with Radiative Transfer LUT (libRadtran)) was used to perform atmospheric correction from Level-1C to Level-2A [29]. The MODIS data used were MOD09 and MOD02 1 km, and they were eliminated via the Bow-tie effect using the MCTK plug-in by ENVI. For MOD09, they were atmospherically corrected surface reflectance data (with the 6S Radiation Transfer) [30], and no additional atmospheric correction was needed; for MOD02 1 km, the FLAASH Atmospheric Correction tool (with the MODTRAN radiation transfer) by ENVI was used [31]. Moreover, all images were converted to UTM projection coordinates and resampled at a 20 m spatial resolution.

3. Methods

Figure 3 shows a flowchart of the proposed method, including SSF and Chl-a estimation. For the former, the training datasets were constructed based on the band selection and SRFs. Then, a SSF network, embedded with the physical degradation model and spatial-spectral loss function, was built. For the latter, four ML algorithms were adopted to estimate Chl-a using the MSI reference data or a combination of the fused and MSI reference data. The Chl-a field data were used to prove whether the fused reference data could improve the retrieval accuracy.

3.1. PSSDNF

3.1.1. SRF-Guided Grouping

SRFs refer to the ratio of the received irradiance to the incident irradiance at each wavelength of the sensor [32]. It can provide spectral correlation between multispectral (MSI-like) and hyperspectral (MODIS-like) bands. Unlike traditional DL-based methods, as an auxiliary operation, SRFs can achieve more effective work [33]. As shown in Figure 2, nine MODIS bands (i.e., B1–B4 and B8–B12) and nine MSI bands (i.e., B1–B8 and B8A) were selected for SSF. The classical fusion of multi-spectral and panchromatic images has only one fine-resolution band. The SSF of MODIS and MSI has multiple fine-resolution bands. To improve the training efficiency, we grouped the MODIS bands of the sample dataset by the SRFs of the MSI and MODIS sensors. According to the proximity principle, the MODIS bands were divided into five groups: B8 and B9; B3, B10, and B11; B4 and B12; B1; B2. The MOD09 band(s) in each group match one fine-resolution MSI band, creating a set of MODIS-MSI band pairs, e.g., the MOD09 B3, B10, and B11 group matches MSI B2.

3.1.2. Training Dataset Construction

Mapping relationships between HS resolution band images (e.g., MSI B2) and corresponding LS resolution band images (e.g., MOD09 B3) can be learned through a CNN [34]. Suppose that when two MOD09 bands have a high correlation, these relationships could be applied to other MODIS–MSI band image pairs. Therefore, HS resolution band images could be predicted using a corresponding MODIS band image and one or more highly correlated MODIS–MSI band image pairs. In this study, the prediction from two highly correlated MODIS–MSI band image pairs of different groups was better than that from one band image pair. Taking the MOD09 image from 28 December 2019 as an example, Table 2 lists the correlation coefficients between the nine bands from five groups.

The correlations between MOD09 B2 and the other bands were extremely low, so these bands were not appropriate for fusion with MOD09 B2. However, we found that MOD02 B17 and B18 have high correlations (0.957 and 0.951, respectively) with MOD09 B2. Table 3 lists two strongly correlated MODIS–MSI band image pairs for each group of MOD09 band(s). For this, bands with the same group shared the same two MODIS–MSI band image pairs (e.g., B4 and B12 shared the same two MODIS–MSI band image pairs: MB11–SB2 and MB9–SB1). All images were uniformly cropped into 80 × 80 sized pairs, with 40 patches overlapping.

3.1.3. PSSDFN Structure

Figure 4 depicts the structure and main components of the proposed PSSDFN. Firstly, two bands with a HS resolution (MSI B_i and B_j) and two bands with a LS resolution (MODIS B_i’ and B_j’) were concated. Then, the concated images were convolved down to one dimension, forming branches 1 and 2. Finally, we added the results of the subtraction between branches 1 and 2 to MODIS B_t to obtain branch 3. In summary, branches 1, 2, and 3 constitute the input of the PSSDFN.

Overall, the Conv Module, DeConv Module, and Degeneration Network form the three main network components (Figure 4b–d, respectively). The Conv Module consists of four different-size convolutional layers, and residual connections were added to fully exploit the deep features. The attention mechanism was introduced between the Conv Module and the DeConv Module to enhance the ability of adaptive weighting. Due to the lack of ideal labels, the MSI images were used as labels. This makes the fusion results lose lots of MODIS spectral information. On this basis, a degradation network was introduced, which considered the physical degradation process of images. A final fusion result with both MSI spatial information and MODIS spectral information was obtained. The degradation model is expressed by the following equation:

L_{S} = H_{R} B D + n

(2)

where

L_{S}

epresents a low spatial but high spectral resolution image,

H_{R}

represents a high spatial-spectral resolution image,

n

represents noise,

B

represents a blurring factor, and

D

represents the down-sampling operation.

3.1.4. Loss Function

The loss function of the proposed PSSDFN can be defined as follows:

ℒ_{T} = ℒ_{s p a t i a l} + λ ℒ_{s p e c t r a l}

(3)

where

ℒ_{T}

denotes the total loss.

ℒ_{s p a t i a l}

represents the spatial-detail loss between the spatial information of the original high spatial resolution image and predicted high spatial-spectral resolution image, which is defined as follows:

ℒ_{s p a t i a l} = \sqrt{\frac{1}{S} \sum_{i = 1}^{S} {(r_{i} - {\hat{r}}_{i})}^{2}}

(4)

where

S

is the batch size, and

r_{i}

and

{\hat{r}}_{i}

denote the reflectance value in the original MSI image and the predicted high spatial-spectral resolution image, respectively.

ℒ_{s p e c t r a l}

denotes the spectral-fidelity loss between the spectral information of the degraded result of the predicted image and original multi-spectral image, which is defined as follows:

ℒ_{s p e c t r a l} = \sqrt{\frac{1}{S} \sum_{i = 1}^{S} {(r_{i}^{'} - {\hat{r}}_{i}^{'})}^{2}}

(5)

where

r_{i}^{'}

and

{\hat{r}}_{i}^{'}

denote the original MODIS image and the degraded result of the predicted image (

r_{i}^{'}

), respectively.

λ

is a regularization parameter that balances the loss between spectral and spatial information. From Equation (3), we know that the larger the

λ

value, the richer the spectral information of the fused results and the poorer the spatial detail, and vice versa. Considering the importance of spectral information for Chl-a retrieval, we increased

λ

until a spatial critical point was reached. To improve the training efficiency of the PSSDFN, the same groups of bands share the same

λ

.

3.1.5. Quantitative Metrics

The fusion effect may not be comprehensively reflected using a single evaluation indicator, such as a spatial or spectral indicator. Therefore, as a comprehensive evaluation metric, the total spatial-spectral correlation coefficient (

C C_{t o l}

) was proposed. This consists of spatial (

C C_{s p a}

) and spectral correlation coefficients (

C C_{s p e}

), and can be expressed as:

C C_{t o l} = C C_{s p a} + C C_{s p e}

(6)

where these metrics are all calculated using the Pearson correlation coefficient. The high-pass-filtered fused results and the high-pass-filtered MSI images are used to calculate

C C_{s p a}

, while the fused images resampled to the MODIS scales and original MODIS images are used to calculate

C C_{s p e}

. The filter was a Laplacian filter, as illustrated in the following equation [35]:

m a s k [\begin{matrix} - 1 & - 1 & - 1 \\ - 1 & 8 & - 1 \\ - 1 & - 1 & - 1 \end{matrix}]

(7)

3.2. Chl-a Estimation

3.2.1. Band Selection

As shown in Figure 2, the spatial resolution of nine bands of MOD09 images can be enhanced (named fused bands) via fusion with the corresponding MSI images. Considering that the fused and original MSI data partially overlap regarding their spectral range, band selection was conducted based on the spectral resolution and correlation between the reflectance values and Chl-a field concentrations. Moreover, it is evident that some spectral bands (SB5–SB8) of the MSI images were not involved in the fusion process. The combination of these bands and the selected bands may improve the Chl-a retrieval accuracy. Therefore, fused band 2 (FB2), FB3, FB4, FB8, FB10, FB12, Sentinel-2 MSI band 1 (SB1), SB4, SB5, SB6, SB7, and SB8 were selected to collaboratively retrieve the Chl-a concentration, while the original MSI images (i.e., SB1–SB8 and SB8A) were selected as the control group.

3.2.2. Model Structures

ML algorithms can develop a good-performance estimation model [36]. Here, we used some representative ML algorithms, such as the Adaptive boosting tree (Adaboost) [37], Support Vector Regression (SVR) [38], extreme gradient boosting tree (XGboost) [39], and the Gradient Boosting Decision Tree (GBDT) [40].

Different band combinations are usually used as the input variables for the ML models. For MSI retrieval, 12 spectral variables were input into each ML model (Figure 5a): the nine MSI bands (SB1–SB8, SB8A), blue–green ratio (SB1/SB3) [41], red–green ratio (SB4/SB3) [42], and near-infrared–red ratio (SB8A/SB4) [43]. For collaborative retrieval, 15 spectral variables were input (Figure 5a): six fused bands (FB8, FB3, FB10, FB4, FB12, and FB2), six MSI bands (SB1 and SB4–SB8), and three band ratio (SB1/FB12, SB4/FB12, and FB2/SB4). These adopted input variables perform well in the four ML models.

Figure 5b shows the bands involved in MSI retrieval and collaborative retrieval. Both MSI retrieval and collaborative retrieval retain three important red-edge bands of MSI, i.e., SB5, SB6, and SB7. Compared to MSI retrieval, however, collaborative retrieval has the following features: (1) more variables, (2) more unique bands, which are not included in MSI retrieval (e.g., FB8), and (3) finer bands (e.g., FB3 and FB10 corresponding to SB2).

A grid search tuning method was adopted to solve several hyperparameters of ML models. To avoid the retrieval uncertainty caused by the time-mismatch between the satellite observation time and the sampling time, 63 valid Chl-a field datasets obtained on 25 June and 2 November 2020 (time-consistent, shown in Figure 1 and Table 1) were used. Here, 44 (70%) sample datasets were randomly selected as training datasets, while 19 (30%) sample datasets were selected as validated datasets.

4. Results

4.1. PSSDNF Results

4.1.1. Fused Results

To illustrate the effectiveness of the proposed PSSDFN, the fused results were evaluated using both visualization and quantitative metrics. Figure 6 shows the visualized images for the four different dates in false color. It is apparent that the fused results without the degradation model in Figure 6c are more similar to Figure 6b than Figure 6a, since the labels were the MSI data. However, the fused results in Figure 6d seem to retain the spectral information from Figure 6a and spatial information from Figure 6b, which is crucial for Chl-a retrieval.

The quantitative

C C_{t o l}

values are shown in Figure 7. Note that the

C C_{t o l}

were not calculated for the entire area but only for the cloudless pixels (yellow boxed area in Figure 6), thus avoiding the interference of cloud cover. The

C C_{t o l}

values of the PSSDFN (with the degradation model) were better than those without the degradation model, which indicates that the PSSDFN has a high performance regarding the integration of spatial-spectral information.

4.1.2. Spectral Comparisons of Fused and MSI Images

To further illustrate the advantages of the fused images obtained via the PSSDFN, we compared the spectral information of the fused and original MSI images. Firstly, we resampled the fused and MSI images to the MODIS scale (i.e., 1 km), and then calculated the spectral correlation of the MODIS images. Figure 8 shows the

C C_{s p e}

values of the nine fused bands and corresponding original MSI bands at four different dates. Compared to the MSI images, the fused images have a better performance regarding the

C C_{s p e}

for each band. Likewise, the

C C_{s p e}

were calculated using a threshold method.

4.2. Chl-a Estimation Results

4.2.1. Accuracy Comparison of MSI and Collaborative Retrieval

In this section, a comparison between MSI and collaborative retrieval was conducted using four ML models. As described in Section 3.2.2, time-consistent datasets obtained on 25 June and 2 November 2020 were used for the ML models. The validated Chl-a dataset was used to evaluate the two retrieval cases using the established models. Figure 9 shows the relationships between the estimated and measured Chl-a of four ML models for MSI and collaborative retrieval. The root-mean-square error (RMSE), coefficient of determination (R²), and mean absolute percent error (MAPE) were used to perform statistical analyses. Despite some inconsistencies in MAPE (the Adaboost model and the SVR model), overall, the R² and RMSE of all four models showed that collaborative retrieval was better than MSI retrieval, which indicates that the fused images can improve the retrieval accuracy of Chl-a. Moreover, we can see that the GBDT model obtained the highest score for both collaborative (R² = 0.85, RMSE = 28.10 µg/L) and MSI retrieval (R² = 0.77, RMSE = 34.77 µg/L).

4.2.2. Impact of the Degradation Model on Chl-a Retrieval

To demonstrate the influence of the degradation model in the PSSDFN on Chl-a retrieval, MSI data and fused data obtained without the degradation model were used for collaborative retrieval. Figure 10 shows the relationships between the estimated and measured Chl-a without the degradation model. Compared with the collaborative retrieval graphs in Figure 9, the R² of the four ML models in Figure 10 are lower, while the RMSE are higher. These quantitative evaluations indicated that the accuracy of Chl-a retrieval can be improved when the input variables are derived from the fusion network with the degradation model.

4.2.3. Spatiotemporal Distribution of Chl-a

To show the spatiotemporal distribution of Chl-a across Lake Chaohu, the Chl-a concentrations retrieved from the GBDT model on 25 June and 2 November 2020 were shown in Figure 11. The Chl-a concentrations in Figure 11a during summer were generally higher than those in Figure 11b during autumn. A high air temperature and abundant sunshine may cause the intensification of eutrophication. In space, there were also differences in the Chl-a concentrations across Lake Chaohu. Specifically, the Chl-a concentrations of western Lake Chaohu during summer (Figure 11a) and autumn (Figure 11b) were generally higher than those of the other lake orientations. This is related to the input of nitrogen, phosphorus, and other nutrients carried by rivers, such as the Nanfei River. In general, the differences in the Chl-a concentration were significantly related to the average daily temperature, sunshine duration, input of nitrogen, and precipitation; however, these relationships were difficult to quantify.

5. Discussion

The PSSDFN was proposed for Chl-a estimation for Lake Chaohu using Terra/MODIS and Sentinel-2/MSI data. The performance of the PSSDFN was subsequently evaluated through a comparison between the collaborative Chl-a retrieval, MSI Chl-a retrieval, and in situ observations.

5.1. Advantages and Limitations of the PSSSDNF

Compared with the other SSF methods, the three main advantages of the PSSDFN were as follows: (1) A more targeted fusion strategy. The PSSDFN was designed to fuse MODIS and MSI data for Chl-a retrieval in coastal water (e.g., less than 1 km from the shoreline) in large- and medium-sized inland lakes/oceans. After the appropriate MODIS and MSI bands were subjected to the proposed fusion process, the fused data were used to retrieve Chl-a. Although some previous studies have adopted a “fusion then retrieval” strategy [44,45,46], the retrieved parameter is rarely considered during the preceding fusion process. (2) The PSSDFN combines residual connectivity and attention mechanisms for extracting the useful multilevel features of the input data and obtains fused data with a high spatial-spectral resolution. The fused data are suitable for Chl-a retrieval. (3) The PSSDFN introduces physical constraints, such as SRFs and the degradation model. The SRFs were used to guide the grouping of the nine MOD09 bands, and the same groups of bands share the same parameters to speed up the efficiency of the network training. Compared with Figure 10, the collaborative retrieval graphs in Figure 9 illustrated that the degradation model could improve the retrieval accuracy. Coupling the laws of physics and ML to develop an inversion framework may not only improve the model accuracy but also enhance the understanding of variables [33,47]. However, it should be known that the current coupling is simple. In the future, exploring new ideas and methods of deep coupling will be key for developing quantitative remote sensing [48].

However, the PSSDFN has several potential limitations, which were as follows: (1) The PSSDFN required images of two LS bands that have a high correlation, and then the relationship of one LS–HS band image pair can be applied to the other. Therefore, the PSSDFN would not be able to work effectively if the images of the two LS bands have no or a low correlation. (2) The performance of DL-based methods is affected by the training dataset. In this study, 10 MODIS–MSI image pairs at different dates were used for the training stage, while one MODIS–MSI image pair was used for the test stage. The performance of the PSSDFN will degrade when high-quality image pairs are difficult to obtain. (3) Sufficient training samples are necessary for a CNN to model a relationship of the LS–HS band image pair. Considering the differences among the different bands of input data, a group training method was adopted. In this respect, DL-based methods generally have a higher time cost than traditional methods.

5.2. Evaluation of the PSSSDNF

Theoretically, it is not sufficient to evaluate the PSSDNF using the specified regions (yellow boxes in Figure 6) from the fused images. Even though the selected images are the best, there are just too many clouds (both thick and thin) covering the study area. To ensure that the evaluation was conducted using the cloud-free fused image, the specified area with yellow boxes in Figure 6 was adopted. Taking the image of 28 December 2019 as an example, we also evaluated the PSSDNF using a larger area with more pixels, as shown in Figure 12. The cloud-contaminated pixels in the fused image were first masked using an empirical threshold, then the performance of PSSDNF was tested in other pixels. Figure 13 shows the comparison of evaluation metrics (

C C_{t o l}

and

C C_{s p e}

) using pixels in yellow boxes (the same as in Figure 6) and larger areas (all pixels outside of the masks) on 28 December 2019. It can be seen that PSSDNF still has better results for the larger areas, close to that for the yellow box. It should be noted that we only selected an image with a small amount of clouds for this evaluation of large areas. Since the generation of an effective mask involves cloud detection technology, and a cloudy image may make the mask inadequate, which in turn would generate inaccurate evaluation results.

5.3. Influence Factors on Chl-a Retrieval

Regarding the influence factors on Chl-a retrieval, three issues require further investigation. The first issue is the mismatch on data collection time. Two MODIS–MSI image pairs collected in 2020 (June and November) were used as the test datasets, while the training datasets were collected in 2018 and 2019. The experimental results indicate that the PSSDFN has some generalization ability. Therefore, whether the accuracy of Chl-a retrieval can be improved by adding training datasets from 2020 requires further investigation. Moreover, a time difference exists between the remotely sensed and in situ observations (i.e., data observed on 2 August 2018 and 27 December 2019). A previous study suggested that a matched dataset with a ±7-day time window should be considered [7]. However, the extent of the uncertainty caused by this mismatch remains unknown. Another experiment was conducted with both a time-mismatched dataset and a time-consistent dataset. A total of 120 valid Chl-a datasets collected from all four dates (as shown in Figure 1) were used, and the models and training methods were consistent with Section 3.2.2. Figure 14 shows the statistical metrics for MSI and collaborative retrieval under such datasets. Obviously, collaborative retrieval still outperformed the corresponding MSI retrieval for the four ML models, which is consistent with the pattern depicted in Figure 9. In addition, the GBDT model showed the best accuracy in both Figure 9 (R² = 0.85, RMSE = 28.10 µg/L) and Figure 14 (R² = 0.86, RMSE = 49.51 µg/L). Similarly, the Chl-a concentrations across Lake Chaohu at the four dates were retrieved using the highest-scoring GBDT model (Figure 15). The distribution of Chl-a concentration in Figure 15c,d is roughly similar to that in Figure 11a,b. However, there was also some difference in high Chl-a values on 25 June 2020, and this may be due to the lack of high Chl-a values in the training data in Figure 9, resulting in a lower prediction of high Chl-a values in Figure 11. That is to say, despite some time differences, the inversion accuracy of the model is acceptable as long as the Chl-a concentration does not significantly vary.

The second issue is concerning the data and pre-processing, especially the atmospheric correction method. MODIS B13–B16 (wavelength range from 662 to 887 nm) suffer saturation in the Lake Chaohu, making these bands unavailable. Fusing the MSI data and the ocean products with suitable band settings (e.g., OLCI) will be more beneficial for conducting Chl-a retrievals. Moreover, for MOD09, MOD02, and MSI data, the adopted atmospheric correction methods are the 6S radiation transfer, MODTRAN 4+ radiation transfer, and Radiative Transfer LUT (libRadtran), respectively. These methods may be more suitable for the retrieval of continental parameters (e.g., land surface reflectance), and may bring some uncertainty for Chl-a retrieval. However, robust atmospheric correction in turbid lakes remains a challenge [11] and water-based atmospheric correction may further improve the retrieval accuracy [49].

The third issue is concerning the retrieval methods. A number of ML models have been developed to derive Chl-a. As the retrieval methods were not the focus of this study, only four ML models were used. All four methods showed that the PSSDFN could improve the accuracy of Chl-a retrieval, and the GBDT model obtained the highest scores. Other advanced methods could also be used in combination with the PSSDFN.

6. Conclusions

In this study, the PSSDFN was successfully used to obtain fused reflectance data with a high spatial-spectral resolution using MODIS and MSI data. Regarding the four ML algorithms, the Chl-a retrieved from Lake Chaohu using the fused and MSI reflectance data (collaborative retrieval) was better than that using only the MSI reflectance data (MSI retrieval). Although the PSSDFN was proposed for Chl-a retrieval in this study, it advances the capability of producing other remotely sensed water quality parameters with a high accuracy. Such a capability will provide a reference for ecological monitoring, management, and water body restoration.

Author Contributions

Conceptualization, P.W.; methodology, Y.H., Y.W. and P.W.; formal analysis, Y.H., X.M. and P.W.; investigation, X.M. and Y.H.; resources, J.W. and Y.W.; data curation, Y.H.; writing—original draft preparation, Y.H. and P.W.; writing—review and editing, P.W., J.W. and Y.W.; visualization, Y.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Science and Technology Major Project of Anhui Province (grant 201903a07020014) and by the National Natural Science Foundation of China (No. 42271381 & No. 32171573).

Conflicts of Interest

The authors declare no conflict of interest.

References

O’Reilly, J.E.; Maritorena, S.; Mitchell, B.G.; Siegel, D.A.; Carder, K.L.; Garver, S.A.; Kahru, M.; McClain, C. Ocean color chlorophyll algorithms for SeaWiFS. J. Geophys. Res.-Oceans 1998, 103, 24937–24953. [Google Scholar] [CrossRef] [Green Version]
Papenfus, M.; Schaeffer, B.; Pollard, A.I.; Loftin, K. Exploring the potential value of satellite remote sensing to monitor chlorophyll-a for US lakes and reservoirs. Environ. Monit. Assess. 2020, 192, 808. [Google Scholar] [CrossRef] [PubMed]
Xiang, X.; Lu, W.; Cui, X.; Li, Z.; Tao, J. Simulation of Remote-Sensed Chlorophyll Concentration with a Coupling Model Based on Numerical Method and CA-SVM in Bohai Bay, China. J. Coast. Res. 2018, 84, 1–9. [Google Scholar] [CrossRef]
Arias-Rodriguez, L.F.; Duan, Z.; Díaz-Torres, J.d.J.; Basilio Hazas, M.; Huang, J.; Kumar, B.U.; Tuo, Y.; Disse, M.J.S. Integration of Remote Sensing and Mexican Water Quality Monitoring System Using an Extreme Learning Machine. Sensors 2021, 21, 4118. [Google Scholar] [CrossRef] [PubMed]
Kratzer, S.; Plowey, M. Integrating mooring and ship-based data for improved validation of OLCI chlorophyll-a products in the Baltic Sea. Int. J. Appl. Earth Obs. Geoinf. 2021, 94, 102212. [Google Scholar] [CrossRef]
Wang, M.H.; Jiang, L.D.; Mikelsons, K.; Liu, X.M. Satellite-derived global chlorophyll-a anomaly products. Int. J. Appl. Earth Obs. Geoinf. 2021, 97, 102288. [Google Scholar] [CrossRef]
Li, S.; Song, K.; Wang, S.; Liu, G.; Wen, Z.; Shang, Y.; Lyu, L.; Chen, F.; Xu, S.; Tao, H.; et al. Quantification of chlorophyll-a in typical lakes across China using Sentinel-2 MSI imagery with machine learning algorithm. Sci. Total Environ. 2021, 778, 146271. [Google Scholar] [CrossRef]
Dall’Olmo, G.; Gitelson, A.A.; Rundquist, D.C.; Leavitt, B.; Barrow, T.; Holz, J.C. Assessing the potential of SeaWiFS and MODIS for estimating chlorophyll concentration in turbid productive waters using red and near-infrared bands. Remote Sens. Environ. 2005, 96, 176–187. [Google Scholar] [CrossRef]
Qi, L.; Hu, C.; Duan, H.; Barnes, B.B.; Ma, R. An EOF-Based Algorithm to Estimate Chlorophyll a Concentrations in Taihu Lake from MODIS Land-Band Measurements: Implications for Near Real-Time Applications and Forecasting Models. Remote Sens. 2014, 6, 10694–10715. [Google Scholar] [CrossRef] [Green Version]
Gitelson, A.A.; Dall’Olmo, G.; Moses, W.; Rundquist, D.C.; Barrow, T.; Fisher, T.R.; Gurlin, D.; Holz, J. A simple semi-analytical model for remote estimation of chlorophyll-a in turbid waters: Validation. Remote Sens. Environ. 2008, 112, 3582–3593. [Google Scholar] [CrossRef]
Kravitz, J.; Matthews, M.; Bernard, S.; Griffith, D. Application of Sentinel 3 OLCI for chl-a retrieval over small inland water targets: Successes and challenges. Remote Sens. Environ. 2020, 237, 111562. [Google Scholar] [CrossRef]
Cao, Z.; Ma, R.; Duan, H.; Pahlevan, N.; Melack, J.; Shen, M.; Xue, K. A machine learning approach to estimate chlorophyll-a from Landsat-8 measurements in inland lakes. Remote Sens. Environ. 2020, 248, 111974. [Google Scholar] [CrossRef]
Toming, K.; Kutser, T.; Laas, A.; Sepp, M.; Paavel, B.; Nõges, T. First experiences in mapping lake water quality parameters with Sentinel-2 MSI imagery. Remote Sens. 2016, 8, 640. [Google Scholar] [CrossRef] [Green Version]
Cao, Z.; Ma, R.; Melack, J.M.; Duan, H.; Liu, M.; Kutser, T.; Xue, K.; Shen, M.; Qi, T.; Yuan, H. Landsat observations of chlorophyll-a variations in Lake Taihu from 1984 to 2019. Int. J. Appl. Earth Obs. Geoinf. 2022, 106, 102642. [Google Scholar] [CrossRef]
Cao, Z.; Ma, R.; Duan, H.; Xue, K. Effects of broad bandwidth on the remote sensing of inland waters: Implications for high spatial resolution satellite data applications. J. Photogramm. Remote Sens. 2019, 153, 110–122. [Google Scholar] [CrossRef]
Mohebzadeh, H.; Yeom, J.; Lee, T. Spatial Downscaling of MODIS Chlorophyll-a with Genetic Programming in South Korea. Remote Sens. 2020, 12, 1412. [Google Scholar] [CrossRef]
Guo, S.; Sun, B.; Zhang, H.K.; Liu, J.; Chen, J.; Wang, J.; Jiang, X.; Yang, Y. MODIS ocean color product downscaling via spatio-temporal fusion and regression: The case of chlorophyll-a in coastal waters. Int. J. Appl. Earth Obs. Geoinf. 2018, 73, 340–361. [Google Scholar] [CrossRef]
Shen, H.; Jiang, M.; Li, J.; Yuan, Q.; Wei, Y.; Zhang, L. Spatial–spectral fusion by combining deep learning and variational model. IEEE Trans. Geosci. Remote Sens. 2019, 57, 6169–6181. [Google Scholar] [CrossRef]
Liu, P.; Li, J.; Wang, L.; He, G. Remote Sensing Data Fusion with Generative Adversarial Networks: State-of-the-Art Methods and Future Research Directions. IEEE Trans. Geosci. 2022, 10, 295–328. [Google Scholar] [CrossRef]
Zhang, H.; Xu, H.; Tian, X.; Jiang, J.; Ma, J. Image fusion meets deep learning: A survey and perspective. Inf. Fusion 2021, 76, 323–336. [Google Scholar] [CrossRef]
Guo, H.; Shi, Q.; Du, B.; Zhang, L.; Wang, D.; Ding, H. Scene-Driven Multitask Parallel Attention Network for Building Extraction in High-Resolution Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2020, 59, 4287–4306. [Google Scholar] [CrossRef]
Yuan, Q.; Wei, Y.; Meng, X.; Shen, H.; Zhang, L. A Multiscale and Multidepth Convolutional Neural Network for Remote Sensing Imagery Pan-Sharpening. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 978–989. [Google Scholar] [CrossRef] [Green Version]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Xiao, J.; Li, J.; Yuan, Q.; Jiang, M.; Zhang, L. Physics-based GAN with Iterative Refinement Unit for Hyperspectral and Multispectral Image Fusion. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 6827–6841. [Google Scholar] [CrossRef]
Duan, H.; Ma, R.; Xu, X.; Kong, F.; Zhang, S.; Kong, W.; Hao, J.; Shang, L. Two-Decade Reconstruction of Algal Blooms in China’s Lake Taihu. Environ. Sci. Technol. 2009, 43, 3522–3528. [Google Scholar] [CrossRef] [PubMed]
Pyo, J.; Pachepsky, Y.; Baek, S.-S.; Kwon, Y.; Kim, M.; Lee, H.; Park, S.; Cha, Y.; Ha, R.; Nam, G.; et al. Optimizing Semi-Analytical Algorithms for Estimating Chlorophyll-a and Phycocyanin Concentrations in Inland Waters in Korea. Remote Sens. 2017, 9, 542. [Google Scholar] [CrossRef] [Green Version]
Liu, X.; Zhang, G.; Sun, G.; Wu, Y.; Chen, Y. Assessment of Lake water quality and eutrophication risk in an agricultural irrigation area: A case study of the Chagan Lake in Northeast China. Water 2019, 11, 2380. [Google Scholar] [CrossRef] [Green Version]
Brivio, P.A.; Giardino, C.; Zilioli, E. Determination of chlorophyll concentration changes in Lake Garda using an image-based radiative transfer code for Landsat TM images. Int. J. Remote Sens. 2001, 22, 487–502. [Google Scholar] [CrossRef]
Main-Knorn, M.; Pflug, B.; Louis, J.; Debaecker, V.; Müller-Wilm, U.; Gascon, F. Sen2Cor for sentinel-2. In Proceedings of the Image and Signal Processing for Remote Sensing XXIII, Warsaw, Poland, 11–13 September 2017; Volume 10427, pp. 37–48. [Google Scholar]
Vermote, E.F.; Tanre, D.; Deuze, J.L.; Herman, M.; Morcette, J.J. Second simulation of the satellite signal in the solar spectrum, 6s: An overview. IEEE Trans. Geosci. Remote Sens. 1997, 3, 675–686. [Google Scholar] [CrossRef] [Green Version]
Berk, A.; Anderson, G.; Bernstein, L.; Chetwynd, J.J.; Richtsmeier, S.; Pukall, B. MODTRAN4 radiative transfer modeling for atmospheric correction. Proc. SPIE 1999, 3756, 348–353. [Google Scholar]
Li, J.; Fu, Q.; Jiang, T. Remote Sensing Image Fusion Based on Spectral Response Function and Global Variance Matching. Acta Photonica Sin. 2020, 49, 1010001. [Google Scholar]
He, J.; Li, J.; Yuan, Q.; Shen, H.; Zhang, L. Spectral Response Function-Guided Deep Optimization-Driven Network for Spectral Super-Resolution. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 4213–4227. [Google Scholar] [CrossRef] [PubMed]
Shao, Z.; Cai, J. Remote Sensing Image Fusion with Deep Convolutional Neural Network. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 1656–1669. [Google Scholar] [CrossRef]
Zhou, J.; Civco, D.L.; Silander, J.A. A wavelet transform method to merge Landsat TM and SPOT panchromatic data. Int. J. Remote Sens. 1998, 19, 743–757. [Google Scholar] [CrossRef]
Reichstein, M.; Camps-Valls, G.; Stevens, B.; Jung, M.; Denzler, J.; Carvalhais, N. Deep learning and process understanding for data-driven Earth system science. Nature 2019, 566, 195–204. [Google Scholar] [CrossRef]
Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef] [Green Version]
Mountrakis, G.; Im, J.; Ogole, C. Support vector machines in remote sensing: A review. ISPRS J. Photogramm. Remote Sens. 2011, 66, 247–259. [Google Scholar] [CrossRef]
Sagi, O.; Rokach, L. Approximating XGBoost with an interpretable decision tree. Inform. Sci. 2021, 572, 522–542. [Google Scholar] [CrossRef]
Cai, W.; Zhang, Y.; Zhou, J. Maximizing expected model change for active learning in regression. In Proceedings of the 2013 IEEE 13th International Conference on Data Mining, Dallas, TX, USA, 7–10 December 2013; pp. 51–60. [Google Scholar]
Ha, N.T.T.; Koike, K.; Nhuan, M.T.; Canh, B.D.; Thao, N.T.P.; Parsons, M. Landsat 8/OLI two bands ratio algorithm for chlorophyll-a concentration mapping in hypertrophic waters: An application to West Lake in Hanoi (Vietnam). IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 4919–4929. [Google Scholar] [CrossRef]
Watanabe, F.; Alcantara, E.; Rodrigues, T.; Rotta, L.; Bernardo, N.; Imai, N. Remote sensing of the chlorophyll-a based on OLI/Landsat-8 and MSI/Sentinel-2A (Barra Bonita reservoir, Brazil). An. Acad. Bras. Cienc. 2018, 90, 1987–2000. [Google Scholar] [CrossRef] [Green Version]
Duan, H.; Zhang, Y.; Zhang, B.; Song, K.; Wang, Z. Assessment of chlorophyll-a concentration and trophic state for Lake Chagan using Landsat TM and field spectral data. Environ. Monit. Assess. 2007, 129, 295–308. [Google Scholar] [CrossRef] [PubMed]
Doña, C.; Chang, N.-B.; Caselles, V.; Sánchez, J.M.; Camacho, A.; Delegido, J.; Vannah, B.W. Integrated satellite data fusion and mining for monitoring lake water quality status of the Albufera de Valencia in Spain. J. Environ. Manag. 2015, 151, 416–426. [Google Scholar] [CrossRef] [PubMed]
Shi, J.; Shen, Q.; Yao, Y.; Li, J.; Chen, F.; Wang, R.; Xu, W.; Gao, Z.; Wang, L.; Zhou, Y. Estimation of Chlorophyll-a Concentrations in Small Water Bodies: Comparison of Fused Gaofen-6 and Sentinel-2 Sensors. Remote Sens. 2022, 14, 229. [Google Scholar] [CrossRef]
Yang, H.; Du, Y.; Zhao, H.; Chen, F. Water Quality Chl-a Inversion Based on Spatio-Temporal Fusion and Convolutional Neural Network. Remote Sens. 2022, 14, 1267. [Google Scholar] [CrossRef]
Yuan, Q.; Shen, H.; Li, T.; Li, Z.; Li, S.; Jiang, Y.; Xu, H.; Tan, W.; Yang, Q.; Wang, J.; et al. Deep learning in environmental remote sensing: Achievements and challenges. Remote Sens. Environ. 2020, 241, 111716. [Google Scholar] [CrossRef]
Yang, Q.; Jin, C.; Li, T.; Yuan, Q.; Shen, H.; Zhang, L. Research progress and challenges of data-driven quantitative remote sensing. Nat. Remote Sens. Bull. 2021, 26, 268–285. [Google Scholar]
Vanhellemont, Q.; Ruddick, K. Acolite for Sentinel-2: Aquatic applications of MSI imagery. In Proceedings of the 2016 ESA Living Planet Symposium, Prague, Czech Republic, 9–13 May 2016; pp. 9–13. [Google Scholar]

Figure 1. Study area and field survey stations. (a) Red, yellow, and blue dots represent the different layout methods. (b) In situ Chl-a at four sampling dates, with the average value ± standard deviation for each date. X-axis represents the valid sampling points (the points excluding cloud cover and measurement errors) and their number for each date.

Figure 2. Distribution of MODIS and MSI bands (B) with wavelengths between 0.4 and 0.9 μm. MOO09 B13–B16 (suffering saturation with gray) were not included in this study.

Figure 3. Flowchart depicting the study methods.

Figure 4. Structure and main components of the physical-based spatial-spectral deep fusion network (PSSDFN). (a) The entire PSSDFN flowchart. (b–d) The specific details regarding the Conv Module, DeConv Module, and Degradation Network, respectively.

Figure 5. (a) Schematic block diagram depicting the brief steps of four machine learning methods to obtain Chl-a retrieval from input variables, including MSI and collaborative retrieval. (b) The bands involved in the MSI and collaborative retrieval.

Figure 6. Visualized images in false color for four dates. (a–d) The MODIS images, MSI images, fused images without the degradation model, and fused images with the degradation model, respectively. The cloudless pixels inside the yellow boxes were selected for quantitative evaluation.

Figure 7. Comparisons of the spatial-spectral correlation coefficient (

C C_{t o l}

) of the fused images with the degradation model (red line) and fused images without the degradation model (blue line) for four dates. (a) 2 August 2018; (b) 28 December 2019; (c) 25 June 2020; (d) 2 November 2020.

Figure 7. Comparisons of the spatial-spectral correlation coefficient (

C C_{t o l}

) of the fused images with the degradation model (red line) and fused images without the degradation model (blue line) for four dates. (a) 2 August 2018; (b) 28 December 2019; (c) 25 June 2020; (d) 2 November 2020.

Figure 8. Spectral comparisons of fused and MSI images for four different dates. (a) 2 August 2018; (b) 28 December 2019; (c) 25 June 2020; (d) 2 November 2020.

Figure 9. Statistical metrics of Chl-a retrieval using four machine learning models for MSI and collaborative retrieval. (a,a’) Adaboost model; (b,b’) SVR model; (c,c’) XGboost model; (d,d’) GBDT model.

Figure 10. Statistical metrics of collaborate retrieval without the degradation process using four machine learning models. (a) Adaboost model; (b) SVR model; (c) XGboost model; (d) GBDT model.

Figure 11. Spatiotemporal distribution of Chl-a across Lake Chaohu at two dates. (a) 25 June 2020; (b) 2 November 2020. The gray masks represent outliers due to cloud cover.

Figure 12. Visualized images in RGB on 28 December 2019. (a–e) The MODIS image, MSI image, fused image without the degradation model, fused image with the degradation model, and fused image with cloud masks, respectively. The cloudless pixels inside the yellow boxes were selected for comparative evaluation (the same as in Figure 6).

Figure 13. Comparison of evaluation metrics for yellow boxes (red line) and larger areas (purple line, all pixels outside of the masks) on 28 December 2019.

Figure 14. Statistical metrics of Chl-a retrieval using four machine learning models for MSI and collaborative retrieval based on Chl-a data from four dates. (a,a’) Adaboost model; (b,b’) SVR model; (c,c’) XGboost model; (d,d’) GBDT model.

Figure 15. Spatiotemporal distribution of Chl-a across Lake Chaohu at four dates. (a) 2 August 2018; (b) 28 December 2019; (c) 25 June 2020; (d) 2 November 2020. The gray masks represent outliers due to cloud cover.

Table 1. MODIS and MSI data used in this study (“√”: available useful images for download; bold font indicates the test dataset; normal font indicates the training dataset).

Date	MODIS	MSI	Date	MODIS	MSI
6 June 2018	√	√	17 April 2019	√	√
9 September 2018	√	√	8 November 2019	√	√
4 October 2018	√	√	8 December 2019	√	√
18 December 2018	√	√	2 August 2018	√	31 July 2018
17 January 2019	√	√	27 December 2019	28 December 2019	28 December 2019
22 January 2019	√	√	25 June 2020	√	√
7 April 2019	√	√	2 November 2020	√	√

Table 2. Correlation coefficients between the nine MOD09 bands from five groups. Except for B2, the two highest correlations from two different groups for each band are marked in bold. Bands with the same group are excluded with a strikethrough. For example, B11 is divided into Group 3, and B3 and B10 (with the same group, Group 3) should be excluded first. We then found that the two highest correlations from the two different groups were 0.982 and 0.937 (i.e., B12 in Group 4 and B9 in Group 5).

Group	Group 1	Group 2	Group 3			Group 4		Group 5
Band	B1	B2	B3	B10	B11	B4	B12	B8	B9
B1	1.000	0.465	0.658	0.724	0.717	0.612	0.647	0.420	0.623
B2	0.465	1.000	−0.219	−0.172	−0.202	−0.312	−0.276	−0.324	−0.209
B3	0.658	−0.219	1.000	~~0.980~~	~~0.964~~	0.915	0.935	0.923	0.993
B10	0.724	−0.172	~~0.980~~	1.000	~~0.985~~	0.931	0.950	0.859	0.962
B11	0.717	−0.202	~~0.964~~	~~0.985~~	1.000	0.967	0.982	0.825	0.937
B4	0.612	−0.312	0.915	0.931	0.967	1.000	~~0.996~~	0.802	0.884
B12	0.647	−0.276	0.935	0.950	0.982	~~0.996~~	1.000	0.812	0.905
B8	0.420	−0.324	0.923	0.859	0.825	0.802	0.812	1.000	~~0.951~~
B9	0.623	−0.209	0.993	0.962	0.937	0.884	0.905	~~0.951~~	1.000

Table 3. Two strongly correlated MODIS–MSI band image pairs for each group. Bands with the same group shared the same two MODIS–MSI band image pairs (e.g., B4 and B12 shared the same two MODIS–MSI band image pairs: MB11–SB2 and MB9–SB1). (MB: MODIS band, SB: Sentinel-2 MSI band; numbers denote certain bands).

Group: MOD09 Band(s)	MODIS–MSI Image Pairs
Group 1: B1	MB10–SB2, MB12–SB3
Group 2: B2	MB17–SB8, MB18–SB9
Group 3: B3, B10, B11	MB9–SB1, MB12–SB3
Group 4: B4, B12	MB11–SB2, MB9–SB1
Group 5: B8, B9	MB3–SB2, MB12–SB3

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

He, Y.; Wu, P.; Ma, X.; Wang, J.; Wu, Y. Physical-Based Spatial-Spectral Deep Fusion Network for Chlorophyll-a Estimation Using MODIS and Sentinel-2 MSI Data. Remote Sens. 2022, 14, 5828. https://doi.org/10.3390/rs14225828

AMA Style

He Y, Wu P, Ma X, Wang J, Wu Y. Physical-Based Spatial-Spectral Deep Fusion Network for Chlorophyll-a Estimation Using MODIS and Sentinel-2 MSI Data. Remote Sensing. 2022; 14(22):5828. https://doi.org/10.3390/rs14225828

Chicago/Turabian Style

He, Yuting, Penghai Wu, Xiaoshuang Ma, Jie Wang, and Yanlan Wu. 2022. "Physical-Based Spatial-Spectral Deep Fusion Network for Chlorophyll-a Estimation Using MODIS and Sentinel-2 MSI Data" Remote Sensing 14, no. 22: 5828. https://doi.org/10.3390/rs14225828

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Physical-Based Spatial-Spectral Deep Fusion Network for Chlorophyll-a Estimation Using MODIS and Sentinel-2 MSI Data

Abstract

1. Introduction

2. Study Area and Materials

2.1. Study Area and In Situ Observations

2.2. Satellite Data Source and Pre-Processing

3. Methods

3.1. PSSDNF

3.1.1. SRF-Guided Grouping

3.1.2. Training Dataset Construction

3.1.3. PSSDFN Structure

3.1.4. Loss Function

3.1.5. Quantitative Metrics

3.2. Chl-a Estimation

3.2.1. Band Selection

3.2.2. Model Structures

4. Results

4.1. PSSDNF Results

4.1.1. Fused Results

4.1.2. Spectral Comparisons of Fused and MSI Images

4.2. Chl-a Estimation Results

4.2.1. Accuracy Comparison of MSI and Collaborative Retrieval

4.2.2. Impact of the Degradation Model on Chl-a Retrieval

4.2.3. Spatiotemporal Distribution of Chl-a

5. Discussion

5.1. Advantages and Limitations of the PSSSDNF

5.2. Evaluation of the PSSSDNF

5.3. Influence Factors on Chl-a Retrieval

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI