# Application of DINCAE to Reconstruct the Gaps in Chlorophyll-a Satellite Observations in the South China Sea and West Philippine Sea

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

_{sat}). Currently, many different methods have been developed to solve this problem by filling the missing values, ranging from simple linear interpolations and bicubic splines to the use of complicated statistical methods, such as optimal interpolation (OI) and Kriging. In the past, OI methods have been widely used to merge the daily multi-sensor data, especially sea surface temperature (SST) [3,4,5].

^{2}over the Ireland–Biscay–Iberia Regional Ocean Observing System area, based on an improved Kriging scheme. However, at present, one of the most widely used methods in Chl-a reconstruction is DINEOF (Data Interpolating Empirical Original Functions), presented by Beckers and Rixen [8] and Alvera-Azcárate et al. [9]. DINEOF is a self-consistent and parameter free piece of technology for the reconstruction of gaps in data, and provides results that are statistically similar to OI reconstruction methods, but 30 times faster [9,10].

## 2. Materials

#### 2.1. Study Area

#### 2.2. Chl-a

_{sat}) data used in this study are a daily merged product obtained from the GlobColour project (http://globcolour.info). The data presented are a weighted average of multiple sensors (MODIS and VIIRSN (Visible Infrared Imaging Radiometer Suite)) with a resolution of ≈4 km for a period from 1 January 2015 to 31 December 2018. These data have been developed, validated, and distributed by ACRI-ST, France. For the domain of study, the size of the images is 240 × 192 pixels. Due to cloud cover, 82% of the sea pixels are missing in this dataset. The temporal variability of total missing data presents an irregular distribution with a minimum daily spatial average of about 30–50% and a maximum daily spatial average of more than 90% (Figure 1a). Spatially, as shown in Figure 1b, regions with the highest percentages of missing data are the coastal waters of Mainland China and ocean waters to the east of Taiwan, with an average data loss higher than 85%. Regions with the lowest percentage in missing data are the coastal waters off the Western Philippine Islands, with an average data loss between 50 and 70%. In the remaining regions, the average percentage of missing data varies from 70 to 85%. In order to obtain reliable results, images with cloud cover larger than 98% were not used for the analysis, which is the same threshold as that of Alvera-Azcárate et al. [35], so the final temporal size was 1268 images (out of an initial 1461 images).

#### 2.3. SST

#### 2.4. Cross-Validation Data

#### 2.5. In Situ Dataset

_{situ}) from 6 to 17 August 2015 were collected during the Kuroshio experiment in the summer of 2015. The total number of surface samples was 21 from 15 in situ stations, the positions of which are shown by the blue circles in Figure 1b. For every in situ sample, the Chl-a satellite image with the shortest time interval (to the collection time of the in situ data sample) was selected, in order to search for the nearest grid cell relative to the location of the field measurement. In order to make the in situ validation more realistic, only cloudy satellite Chl-a matchups (20 samples) were used in the following calculations.

## 3. Methods

_{sat}data were transformed to base-e logarithm (ln) values, following the approach used by Hilborn and Costa [36]. The following discussion refers to ln(mg m

^{−3}) unless otherwise stated. The root mean square error (RMSE) and bias used in this paper are defined as follows:

#### 3.1. DINEOF

_{sat}and SST

_{sat}data separately, and results were normalized; missing data were set to zero. The missing data were then calculated by an iterative procedure. For a detailed description of DINEOF, please refer to Alvera-Azcárate et al. [9] and Alvera-Azcárate et al. [16]. A filter of the temporal covariance matrix was applied to improve the temporal coherency of the results, following Alvera-Azcárate et al. [35]. This filter effectively keeps the coherency in time to ensure a smooth transition between subsequent images. The filter length was 1.1 days. In this study, DINEOF was implemented using the 3.0 Linux binary available through the University of Liège GeoHydrodynamics and Environment Research group (GHER) [37].

#### 3.2. DINCAE

_{sat}and SST

_{sat}are subtracted in order to obtain the Chl-a and SST anomalies. Thereafter, Chl-a and SST anomalies are scaled by the inverse of the error variance which is effectively a constant (zero for the data that are missing). The precise value of this constant is not important because this constant will be multiplied by a weight matrix which will be optimized during the network training [23]. In this study, the parameters of the input dataset in rec2 (rec1) are given in Table 2, and the total input dataset size was 13 (10) × 240 × 192 × 1124.

- The input layer is used to receive the training datasets (training phase) and test datasets (reconstruction phase).
- Encoding layers are equivalent to the singular value decomposition (SVD) in DINEOF, which can effectively reduce the dimensionality of the input data in order to simplify the complexity of the required computing used in the neural networks. The work of the convolution layer is to extract the features of the input data, where the number and scale of these features depend on the depth of the convolution layer, which is also called the “convolution kernel size”. The work of the pooling layer is to compress the features extracted by the convolution layers, and the compression degree is determined by the type, size, and strides of the pooling layer. In this study, the size of convolution kernel was (3 × 3), and the maximum pooling was chosen, where pool size = (2,2) and strides = (2,2).
- Fully connected layers are equivalent to the hidden layer in the traditional feedforward neural network. Their function is to combine the extracted features nonlinearly. There were two full connection layers in this study, which were N/5 (rounded to the nearest integer) and N, where N is the number of neurons in the last pooling layer of the encoder.
- Decoding layers are composed of convolution layers and interpolation layers (nearest neighbor interpolation) to upsample the results. The interpolation layers are skip connected to the output of the pooling layers in order to capture small-scale structures that are lost in the encoding layers and fully connected layers [38].
- The output layer gives the results; that is, an array ${T}_{ijk}$ with a size of 240 × 192 × 2, including two parameters:
- ${T}_{ij1}$: Chl-a scaled by the inverse of the expected error variance;
- ${T}_{ij2}$: logarithm of the inverse of the expected error variance.

- (1)
- A drop-out layer is introduced between the fully connected layers, and is randomly inactivated at a given rate (hereafter referred to as the dropout_rate, to be used later) of neurons in the fully connected layers.
- (2)
- Gaussian-distributed noise is added to the input data with a zero mean and a given standard deviation (hereafter referred to as jitter_std, to be used later).
- (3)
- A regularization technique is added to the Adam optimizer [39]. The basic idea of regularization is to add a penalty term to the loss function to punish the large weight and reduce the complexity of the model and prevent overfitting. In this study, the regularization parameter is set by $\u03f5={10}^{-8}+L2\cdot {\beta}_{L2}$, where L2 is the regularization term, ${\beta}_{L2}$ is the weight for L2, and the value is given as L2_beta in the following discussion. Other standard parameters for the Adam optimizer are the learning rate $\alpha $ = 0.001 and the exponential decay rates for the first moment estimates ${\beta}_{1}$ = 0.9 and the second moment estimates, ${\beta}_{2}$ = 0.999.
- (4)
- The activation function in the convolution layers uses Leaky-RELU [40] instead of the rectified linear unit (RELU) to prevent gradients from falling too fast:$$f\left(x\right)=\{\begin{array}{l}\mathsf{\alpha}\mathrm{x},\left(\mathrm{x}<0\right)\\ x,\left(x>=0\right)\end{array},\mathsf{\alpha}=0.2.$$

## 4. Results

^{−3}), respectively, as shown in Figure 6. These optimal EOFs can explain 98.74% and 98.59% of the initial variance for rec3 and rec4, respectively, which are calculated as the variance of points not covered by clouds.

#### 4.1. Reconstruction Statistics Using CV-Data

_{sat}) and the corresponding Chl-a

_{rec}model results are shown in Figure 8. Comparisons are conducted at all CV-data (286,461). The scatter distribution of rec1 and rec2 results exhibits the highest pixel density along the “ideal” dotted lines, especially in the range of 0.1–1.0 mg m

^{−3}, while the scatters for rec3 and rec4 results are more evenly distributed on both sides of the “ideal” dotted line. It is evident that the Chl-a

_{rec}results are notably biased low at higher concentrations (>10 mg m

^{−3}) for rec1 and rec2, whereas all four reconstruction schemes are biased high at lower concentrations (<0.02 mg m

^{−3}). The statistical results for CV-data correlated with the corresponding Chl-a

_{rec}results, for all reconstruction implementations, show that the RMSEs between the reconstructed data and original data are 0.28, 0.27, 0.30, and 0.33 ln(mg m

^{−3}) for rec1, rec2, rec3, and rec4, respectively. The corresponding linear correlation coefficients (R) are 0.95, 0.95, 0.94, and 0.93 respectively, and the corresponding mean biases are −0.008, 0.003, 0.006, and 0.007 ln(mg m

^{−3}), respectively. Obviously, rec2 performs best overall, producing the lowest RMSE (0.27), the highest linear correlation coefficient (R) (0.95), and the bias nearest to zero (0.003).

^{−3}vs. 0.11 mg m

^{−3}). The comparison shows that, in general for all four reconstruction schemes, higher productivity is achieved in SCS than in WPS. Similar to the global reconstruction results shown in Figure 8, DINCAE is superior to DINEOF in these two sub-areas, producing lower RMSE and higher R. Specifically, DINCAE reconstructions are slightly improved by adding SST to the input data in SCS, as shown by comparing rec1 and rec2, producing lower RMSE values (from 0.26 to 0.25), higher R values (from 0.95 to 0.96), and biases that are nearest zero (from −0.027 to −0.019). However, the improvement from adding SST in DINCAE reconstructions disappears in WPS. For all reconstruction implementations (Table 3), the intercepts are less than 0 and slopes are less than 1, meaning that the reconstructed Chl-a values are underestimated at high concentrations, and overestimated at low concentrations.

#### 4.2. Spatial Distribution of Reconstruction for a Specified Day

_{sat+cloud}and have a reasonably accurate spatial distribution, with patterns that are also indirectly exhibited by those of Chl-a

_{rec}(see Figure 10a,c–f). As such, all four reconstruction schemes can reconstruct the high Chl-a variations (bright yellow color in Figure 10a–f) exhibited in the Taiwan Bank Upwelling and Dongshan Upwelling [12], where very strong upwelling brings cold deep water to the surface, leading to the relatively low SST shown in Figure 10g. Mesoscale features such as filaments associated with the Kuroshio near the Luzon Strait are evident, as shown in Chl-a and SST distributions (Figure 10a,g), and were also revealed in the four reconstruction methodologies (Figure 10c–f).

_{err}) defined as (Chl-a

_{rec}-Chl-a

_{sat})/Chl-a

_{sat}, shown in Figure 11. Obviously, the Re

_{err}values for rec1, rec2, and rec4 are much larger than those of rec3 near the Han River discharge area (22°N, 115°E). A possible reason is that the Chl-a distribution is greatly affected by runoff and human activities near the river plume, leading to low correlation coefficients (−0.15, 0.15) between Chl-a and SST, as shown in Figure 2b. DINCAE may not capture the characteristics of the Chl-a distribution in this area, or there may be overfitting. Furthermore, adding auxiliary data such as SST into the reconstruction models may not improve the results, and may even make them worse, given the near-zero correlation coefficients between Chl-a and SST in this area. For the remaining area, the performance of DINCAE is better than DINEOF, especially in the basin areas. Furthermore, all four methods perform well in terms of scatterplots and statistical results, as shown in Figure 12. Moreover, rec2 has the best overall results, producing the lowest RMSE (0.20), with intercept nearest to zero (−0.14), highest R (0.96), and slope closest to 1.0 (0.94).

#### 4.3. Validation with In Situ Observations

_{rec}of the four reconstruction schemes generally agree well with in situ Chl-a. Among them, the fitted line for rec1 (the blue dotted line in Figure 13) is closest to the ideal line (the grey solid line in Figure 13), with R, slope, and intercept values of 0.925, 0.81, and −0.16 ln(mg m

^{−3}), respectively (shown in Table 4). For these four reconstruction schemes, the slopes of the linear fit between Chl-a

_{situ}and Chl-a

_{rec}are all less than 1, with negative intercepts and high correlation coefficients (R > 0.91), indicating the trend that reconstructed Chl-a values appear to be overestimated for lower values (<0.4 mg m

^{−3}) and underestimated for higher values (0.4–1.6 mg m

^{−3}); this pattern is similar to that shown by the cross validation mentioned above (Figure 8 and Figure 12). One possible reason for this result is the smoothness of the reconstruction method, which makes the reconstructed values in the range of extreme values shift to the normal values. It is not expected that in the comparisons of the two pairs of univariate and multivariable reconstruction schemes, viz., in comparing rec1 vs. rec2 and rec3 vs. rec4, both univariate schemes perform better than the corresponding multivariable schemes, with a lower RMSE (0.38 vs. 0.42 and 0.38 vs. 0.44), a higher R (0.925 vs. 0.917 and 0.921 vs. 0.917), a slope nearer to 1 (0.81 vs. 0.66 and 0.75 vs. 0.63), and an intercept closer to 0 (−1.6 vs. −0.46 and −0.31 vs. −0.47).

## 5. Discussion

## 6. Conclusions

_{sat}data. We also compared the reconstruction performances of DINCAE and DINEOF using Chl-a-only data, and Chl-a and SST data, as input data based on the same cross-validation data. In this methodology, a total of four reconstruction schemes are implemented. Both cross-validation and in situ validation show the high performances of DINEOF and DINCAE methods in Chl-a reconstruction. Furthermore, DINCAE is superior to DINEOF. A somewhat surprising result is that, in contrast to the cross-validation results, the performances of the multivariate reconstruction methods in in situ validation are not as good as those of the univariate reconstruction methods. The prediction capability of DINCAE was also tested and results were found to agree well with the CV-data-2019; see Figure 14a,b. Retraining the prediction model further improves the reconstruction results, as shown in Figure 14c,d.

_{sat}in SCS will always be higher than in WPS. Thus, the performances of these reconstruction methodologies are accordingly different.

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- King, M.D.; Platnick, S.; Menzel, W.P.; Ackerman, S.A.; Hubanks, P.A. Spatial and temporal distribution of clouds observed by MODIS onboard the Terra and Aqua satellites. IEEE Trans. Geosci. Remote Sens.
**2013**, 51, 3826–3852. [Google Scholar] [CrossRef] - Feng, L.; Hu, C. Cloud adjacency effects on top-of-atmosphere radiance and ocean color data products: A statistical assessment. Remote Sens. Environ.
**2016**, 174, 301–313. [Google Scholar] [CrossRef] - Reynolds, R.W.; Smith, T.M.; Liu, C.; Chelton, D.B.; Casey, K.S.; Schlax, M.G. Daily High-Resolution-Blended Analyses for Sea Surface Temperature. J. Clim.
**2007**, 20, 5473–5496. [Google Scholar] [CrossRef] - Group for High Resolution Sea Surface Temperature (GHRSST). Available online: https://www.ghrsst.org/ghrsst-data-services/products/ (accessed on 13 November 2019).
- Hosoda, K.; Sakaida, F. Global Daily High-Resolution Satellite-Based Foundation Sea Surface Temperature Dataset: Development and Validation against Two Definitions of Foundation SST. Remote Sens.
**2016**, 8, 962. [Google Scholar] [CrossRef] [Green Version] - Bennett, A.F. Inverse Modeling of the Ocean and Atmosphere, 1st ed.; Cambridge University Press: Cambridge, UK, 2002. [Google Scholar]
- Saulquin, B.; Gohin, F.; Garrello, R. Regional Objective Analysis for Merging High-Resolution MERIS, MODIS/Aqua, and SeaWiFS Chlorophyll-a Data From 1998 to 2008 on the European Atlantic Shelf. IEEE Trans. Geosci. Remote Sens.
**2011**, 49, 143–154. [Google Scholar] [CrossRef] [Green Version] - Beckers, J.M.; Rixen, M. EOF Calculations and Data Filling from Incomplete Oceanographic Datasets. J. Atmos. Ocean. Technol.
**2003**, 20, 1839–1856. [Google Scholar] [CrossRef] - Alvera-Azcárate, A.; Barth, A.; Rixen, M.; Beckers, J.M. Reconstruction of incomplete oceanographic data sets using empirical orthogonal functions: Application to the Adriatic Sea surface temperature. Ocean Model.
**2005**, 9, 325–346. [Google Scholar] [CrossRef] [Green Version] - Miles, T.N.; He, R.; Li, M. Characterizing the South Atlantic Bight seasonal variability and cold-water event in 2003 using a daily cloud-free SST and chlorophyll analysis. Geophys. Res. Lett.
**2009**, 36. [Google Scholar] [CrossRef] [Green Version] - Abraham, E.R. The generation of plankton patchiness by turbulent stirring. Nature
**1998**, 391, 577–580. [Google Scholar] [CrossRef] - Tang, D.; Kester, D.R.; Ni, I.H.; Kawamura, H.; Hong, H. Upwelling in the Taiwan Strait during the summer monsoon detected by satellite and shipboard measurements. Remote Sens. Environ.
**2002**, 83, 457–471. [Google Scholar] [CrossRef] - Lin, J.; Cao, W.; Wang, G.; Hu, S. Satellite-observed variability of phytoplankton size classes associated with a cold eddy in the South China Sea. Mar. Pollut. Bull.
**2014**, 83, 190–197. [Google Scholar] [CrossRef] [PubMed] - Shiozaki, T.; Chen, Y.-L.L. Different mechanisms controlling interannual phytoplankton variation in the South China Sea and the western North Pacific subtropical gyre: A satellite study. Adv. Space Res.
**2013**, 52, 668–676. [Google Scholar] [CrossRef] - Volpe, G.; Nardelli, B.B.; Cipollini, P.; Santoleri, R.; Robinson, I.S. Seasonal to interannual phytoplankton response to physical processes in the Mediterranean Sea from satellite observations. Remote Sens. Environ.
**2012**, 117, 223–235. [Google Scholar] [CrossRef] - Alvera-Azcárate, A.; Barth, A.; Beckers, J.M.; Weisberg, R.H. Multivariate reconstruction of missing data in sea surface temperature, chlorophyll, and wind satellite fields. J. Geophys. Res. Oceans
**2007**, 112. [Google Scholar] [CrossRef] [Green Version] - Sirjacobs, D.; Alvera-Azcárate, A.; Barth, A.; Lacroix, G.; Park, Y.; Nechad, B.; Ruddick, K.; Beckers, J.-M. Cloud filling of ocean colour and sea surface temperature remote sensing products over the Southern North Sea by the Data Interpolating Empirical Orthogonal Functions methodology. J. Sea Res.
**2011**, 65, 114–130. [Google Scholar] [CrossRef] - Yao, Z.; He, R. Cloud-free sea surface temperature and colour reconstruction for the Gulf of Mexico: 2003–2009. Remote Sens. Lett.
**2012**, 3, 697–706. [Google Scholar] [CrossRef] - Li, Y.; He, R. Spatial and temporal variability of SST and ocean color in the Gulf of Maine based on cloud-free SST and chlorophyll reconstructions in 2003–2012. Remote Sens. Environ.
**2014**, 144, 98–108. [Google Scholar] [CrossRef] - Guisan, A.; Zimmermann, N.E. Predictive habitat distribution models in ecology. Ecol. Model.
**2000**, 135, 147–186. [Google Scholar] [CrossRef] - Elith, J.; Graham, C.H.; Anderson, R.P.; Dudík, M.; Ferrier, S.; Guisan, A.; Hijmans, R.J.; Huettmann, F.; Leathwick, J.R.; Lehmann, A. Novel methods improve prediction of species’ distributions from occurrence data. Ecography
**2006**, 29, 129–151. [Google Scholar] [CrossRef] [Green Version] - Smoliński, S.; Radtke, K. Spatial prediction of demersal fish diversity in the Baltic Sea: Comparison of machine learning and regression-based techniques. ICES J. Mar. Sci.
**2016**, 74. [Google Scholar] [CrossRef] [Green Version] - Barth, A.; Alvera-Azcárate, A.; Licer, M.; Beckers, J.M. DINCAE 1.0: A convolutional neural network with error estimates to reconstruct sea surface temperature satellite observations. Geosci. Model Dev. Discuss.
**2019**, 2019. [Google Scholar] [CrossRef] - Chicco, D.; Sadowski, P.; Baldi, P. Deep autoencoder neural networks for gene ontology annotation predictions. In Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (BCB ’14), Newport Beach, CA, USA, 20–23 September 2014; pp. 533–540. [Google Scholar]
- Jouini, M.; Lévy, M.; Crépon, M.; Thiria, S. Reconstruction of satellite chlorophyll images under heavy cloud coverage using a neural classification method. Remote Sens. Environ.
**2013**, 131, 232–246. [Google Scholar] [CrossRef] - Chapman, C.; Charantonis, A.A. Reconstruction of subsurface velocities from satellite observations using iterative self-organizing maps. IEEE Geosci. Remote Sens. Lett.
**2017**, 14, 617–620. [Google Scholar] [CrossRef] - Patil, K.; Deo, M.C. Prediction of daily sea surface temperature using efficient neural networks. Ocean Dyn.
**2017**, 67, 357–368. [Google Scholar] [CrossRef] - Krasnopolsky, V.; Nadiga, S.; Mehra, A.; Bayler, E.; Behringer, D. Neural networks technique for filling gaps in satellite measurements: Application to ocean color observations. Comput. Intell. Neurosci.
**2016**, 2016, 6156513. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Jo, Y.; Kim, D.; Kim, H. Chlorophyll Concentration Derived from Microwave Remote Sensing Measurements USING Artificial Neural Network Algorithm. J. Mar. Sci
**2018**, 26, 102–110. [Google Scholar] [CrossRef] - Chen, Y.L.L.; Chen, H.-Y.; Tuo, S.-H.; Ohki, K. Seasonal dynamics of new production from Trichodesmium N2 fixation and nitrate uptake in the upstream Kuroshio and South China Sea basin. Limnol. Oceanogr.
**2008**, 53, 1705–1721. [Google Scholar] [CrossRef] [Green Version] - Liu, K.; Atkinson, L.; Quinones, R.; Talaue-McManus, L. Carbon and Nutrient Fluxes in Continental Margins: A Global Synthesis; Springer Science & Business Media: Berlin, Germany, 2010. [Google Scholar] [CrossRef]
- Liu, K.K.; Chao, S.Y.; Shaw, P.T.; Gong, G.C.; Chen, C.C.; Tang, T.Y. Monsoon-forced chlorophyll distribution and primary production in the South China Sea: Observations and a numerical study. Deep Sea Res. Part I Oceanogr. Res. Pap.
**2002**, 49, 1387–1412. [Google Scholar] [CrossRef] - Liu, K.-K.; Tseng, C.-M.; Yeh, T.-Y.; Wang, L.-W. Elevated phytoplankton biomass in marginal seas in the low latitude ocean: A case study of the South China Sea. In Advances in Geosciences: Volume 18: Ocean Science (OS); World Scientific: Singapore, 2010; pp. 1–17. [Google Scholar] [CrossRef]
- McClain, C.R.; Signorini, S.R.; Christian, J.R. Subtropical gyre variability observed by ocean-color satellites. Deep Sea Res. Part II Top. Stud. Oceanogr.
**2004**, 51, 281–301. [Google Scholar] [CrossRef] [Green Version] - Alvera-Azcárate, A.; Barth, A.; Sirjacobs, D.; Beckers, J.M. Enhancing temporal correlations in EOF expansions for the reconstruction of missing data using DINEOF. Ocean Sci.
**2009**, 5, 475–485. [Google Scholar] [CrossRef] [Green Version] - Hilborn, A.; Costa, M. Applications of DINEOF to satellite-derived chlorophyll-a from a productive coastal region. Remote Sens.
**2018**, 10, 1449. [Google Scholar] [CrossRef] [Green Version] - DINEOF—GHER. Available online: http://modb.oce.ulg.ac.be/mediawiki/index.php/DINEOF (accessed on 13 November 2019).
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Cham, Switzerland, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Kingma, D.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv
**2014**, arXiv:1412.6980. [Google Scholar] - Maas, A.L.; Hannun, A.Y.; Ng, A.Y. Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the 30th International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013. [Google Scholar]
- Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Zheng, X. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. arXiv
**2015**, arXiv:1603.04467. [Google Scholar]

**Figure 1.**(

**a**) Temporal variation of cloud cover percentage in the domain (in black) and with a low-pass filter of 30 days (red line); (

**b**) mean percentage of cloud coverage for the complete set of images. SCS stands for South China Sea, and WPS stands for West Philippine Sea. The blue circles represent sampling stations.

**Figure 2.**(

**a**) Averaged Chl-a for 2015 to 2018 using valid pixels; (

**b**) the correlation coefficient between Chl-a and SST over 2015–2018. The red circles represent sampling stations, and the dotted lines are the bathymetric contours of −500 m and −1000 m.

**Figure 3.**Validation data selection process for DINEOF, where the percentage represents the rate of cross-validation points and 52,112,264 of the total number of sea pixels.

**Figure 4.**(

**a**) Percentages for the temporal distribution of cross-validation data. (

**b**) Spatial distribution density of the cross-validation data, where pixels occurring less than twice are marked as white.

**Figure 5.**Schematic flowchart for DINCAE reconstruction, where the percentage represents the rate of cross-validation points and 52,112,264 of the total number of sea pixels.

**Figure 6.**Cross-validation root mean square error (RMSE) (solid lines) and cumulative explained variance (solid circles) for the DINEOF reconstructions. The minimum error is marked with a triangle on each line.

**Figure 7.**Cross-validation errors as a function of iteration for the DINCAE experiments. (

**a**) The optimal jitter_std in rec2 chosen from (0.05, 0.10, 0.15) based on dropout_rate = 0.3 and L2_beta = 0; (

**b**) the optimal dropout_rate in rec2 chosen from (0.3, 0.9, 1.2) based on jitter_std = 0.05 and L2_beta = 0; (

**c**) the optimal L2_beta in rec2 chosen from (0, 0.01, 0.001, 0.0001) based on jitter_std = 0.05 and dropout_rate = 0.9; (

**d**) comparison of the expected errors between rec1 and rec2 using the parameters that jitter_std = 0.05, dropout_rate = 0.9, and L2_beta = 0.001, and the dotted lines are the RMSE values of the CV-data and the averaged outputs of DINCAE between epoch 200 and epoch 1000.

**Figure 8.**Scatterplots of the CV-data (Chl-a

_{sat}) versus the corresponding modeled data (Chl-a

_{rec}) for the four reconstruction schemes. Color scale indicates the number of points in each 0.01 * 0.01 log10 (mg m

^{−3}) bin. Green dotted lines represent the best fit line and the grey dotted lines represent the 1:1 line.

**Figure 9.**Spatial distribution of RMSE between cross-validation data and corresponding modeled data for the four reconstruction schemes: (

**a**) rec1, (

**b**) rec2, (

**c**) rec3 and (

**d**) rec4.

**Figure 10.**Daily reconstruction of 17 May, 2018: (

**a**) original Chl-a

_{sat}; (

**b**) adding artificial cloud to original images Chl-a

_{sat}

_{+cloud}, (

**c**) Chl-a

_{rec1}, (

**d**) Chl-a

_{rec2}, (

**e**) Chl-a

_{rec3}, (

**f**) Chl-a

_{rec4}, and (

**g**) SST

_{sat}.

**Figure 11.**Spatial distribution of relative error, (Chl-a

_{rec}-Chl-a

_{sat})/Chl-a

_{sat}for 17 May 2018.

**Figure 12.**Scatterplots of the CV-data (Chl-a

_{sat}) versus the corresponding modeled data (Chl-a

_{rec}) for the four reconstruction schemes on 17 May, 2018. Color scale indicates the number of points in each 0.01 * 0.01 log10 (mg m

^{−3}) bin. Green dotted lines show the best fit line and grey dotted lines represent the 1:1 line.

**Figure 13.**Comparison of the reconstructed Chl-a

_{rec}data from the four reconstruction schemes, with in situ Chl-a

_{situ}data. The dotted lines represent the fitted lines for the corresponding scatter points using the same colors, for the four schemes. The grey solid line represents the ideal line.

**Figure 14.**Scatterplots of the cross-validation data (Chl-a

_{sat}) versus the corresponding modeled data (Chl-a

_{rec}), where the Chl-a

_{sat}ranged from 2019.1 to 2019.10. (

**a**) Chl-a

_{rec}reconstructed by the rec1-2015 model; (

**b**) Chl-a

_{rec}reconstructed by the rec2-2015 model; (

**c**) Chl-a

_{rec}reconstructed by the rec1-2019 model; (

**d**) Chl-a

_{rec}reconstructed by the rec2-2019 model. The color scale indicates the number of points in each 0.01 * 0.01 log10 (mg m

^{−3}) bin. Green dotted lines show the best fit line and grey dotted lines represent the 1:1 line.

Reconstruction | Method | Input Satellite Data |
---|---|---|

rec1 | DINCAE | ln(Chl-a) |

rec 2 | DINCAE | ln(Chl-a) and SST |

rec 3 | DINEOF | ln(Chl-a) |

rec 4 | DINEOF | ln(Chl-a) and SST |

Index | Var Name |
---|---|

1 | ln(Chl-a) anomalies scaled by the inverse of the error variance (zero if the data are missing) |

2 | Inverse of the error variance (zero if the data are missing) |

3–4 | Scaled ln(Chl-a) anomalies and inverse of error variance of the previous day |

5–6 | Scaled ln(Chl-a) anomalies and inverse of error variance of the following day |

7 | Longitude (scaled linearly between −1 and 1) |

8 | Latitude (scaled linearly between −1 and 1) |

9 | Cosine of the day of the year divided by 365.25 |

10 | Sine of the day of the year divided by 365.25 |

11 | SST anomalies scaled by the inverse of the error variance (zero if the data are missing) * |

12 | Scaled SST anomalies of the previous day * |

13 | Scaled SST anomalies of the following day * |

**Table 3.**Relationship between Chl-a

_{rec}values and corresponding Chl-a

_{sat}values in SCS and WPS.

Area | N, Mean, mg m^{−3} | Method | RMSE, ln(mg m^{−3}) | Bias, ln(mg m^{−3}) | R | Slope | Intercept |
---|---|---|---|---|---|---|---|

SCS | 145,476 0.34 | rec1 | 0.26 | −0.027 | 0.95 | 0.89 | −0.22 |

rec2 | 0.25 | −0.019 | 0.96 | 0.90 | −0.19 | ||

rec3 | 0.28 | −0.014 | 0.94 | 0.88 | −0.21 | ||

rec4 | 0.32 | −0.005 | 0.92 | 0.92 | −0.12 | ||

WPS | 140,985 0.11 | rec1 | 0.29 | 0.011 | 0.91 | 0.77 | −0.57 |

rec2 | 0.28 | 0.025 | 0.91 | 0.78 | −0.53 | ||

rec3 | 0.31 | 0.027 | 0.89 | 0.82 | −0.43 | ||

rec4 | 0.33 | 0.009 | 0.88 | 0.81 | −0.47 |

In Situ Data | Method | RMSE, ln(mg m^{−3}) | Bias, ln(mg m^{−3}) | R | Slope | Intercept |
---|---|---|---|---|---|---|

N = 20mean = −1.61 ln(mg m^{−3}) | rec1 | 0.38 | −0.14 | 0.925 | 0.81 | −0.16 |

rec2 | 0.42 | −0.09 | 0.917 | 0.66 | −0.46 | |

rec3 | 0.38 | −0.09 | 0.921 | 0.75 | −0.31 | |

rec4 | 0.44 | −0.11 | 0.917 | 0.63 | −0.47 |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Han, Z.; He, Y.; Liu, G.; Perrie, W.
Application of DINCAE to Reconstruct the Gaps in Chlorophyll-a Satellite Observations in the South China Sea and West Philippine Sea. *Remote Sens.* **2020**, *12*, 480.
https://doi.org/10.3390/rs12030480

**AMA Style**

Han Z, He Y, Liu G, Perrie W.
Application of DINCAE to Reconstruct the Gaps in Chlorophyll-a Satellite Observations in the South China Sea and West Philippine Sea. *Remote Sensing*. 2020; 12(3):480.
https://doi.org/10.3390/rs12030480

**Chicago/Turabian Style**

Han, Zhaohui, Yijun He, Guoqiang Liu, and William Perrie.
2020. "Application of DINCAE to Reconstruct the Gaps in Chlorophyll-a Satellite Observations in the South China Sea and West Philippine Sea" *Remote Sensing* 12, no. 3: 480.
https://doi.org/10.3390/rs12030480