Chlorophyll Estimation from Multivariate Regression Analysis and Deep Learning Using Remote Sensing Data

Sridhar, Sriniketan; del Castillo, Carlos; Manian, Vidya

doi:10.3390/ecsa-9-13319

Open AccessProceeding Paper

Chlorophyll Estimation from Multivariate Regression Analysis and Deep Learning Using Remote Sensing Data^†

by

Sriniketan Sridhar

¹,

Carlos del Castillo

² and

Vidya Manian

^3,*

¹

Southwestern Education Society, Mayaguez, PR 00682, USA

²

The Ocean Ecology Laboratory, Greenbelt, MD 20771, USA

³

Department of Electrical and Computer Engineering, University of Puerto Rico, Mayaguez, PR 00681, USA

^*

Author to whom correspondence should be addressed.

^†

Presented at the 9th International Electronic Conference on Sensors and Applications, 1–15 November 2022; Available online: https://ecsa-9.sciforum.net/.

Eng. Proc. 2022, 27(1), 78; https://doi.org/10.3390/ecsa-9-13319

Published: 1 November 2022

(This article belongs to the Proceedings of The 9th International Electronic Conference on Sensors and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

The Orinoco river is in Venezuela and flows into the Caribbean sea. The chlorophyll concentration in the ocean delta changes due to the dust deposition from the Orinoco river which affects the primary productivity. The wet and dry deposition measurements were obtained from Modern-Era Retrospective analysis for Research and Applications (MERRA) a NASA climate reanalysis of meteorology, atmospheric chemistry, land, ocean, and aerosols data on a broad range of weather and climate timescales and places. Researchers were not sure how wet and dry deposition from the Orinoco river has affected the chlorophyll concentration in the ocean. Aerosol optical depth (AOD), dry and wet deposition data were obtained from MERRA. Altimetry data of the Orinoco river and chlorophyll concentration data were also obtained from the Giovanni database from 2016 to March 2022. Linear regression analysis of altimetry and chlorophyll concentration showed that the latter did not depend on the water levels. Univariate models for each of the parameters of AOD, wet, and dry deposition were done. Bivariate models were done, adding one additional variable at a time, and finally a multivariate model was built for the prediction of chlorophyll concentration. From the analysis, it was seen that the multivariate models have a higher correlation between chlorophyll and the independent variables. Of all the variables, wet deposition is a better predictor of chlorophyll concentration. A deep learning neural network architecture is developed for performing the forecasting of chlorophyll concentration from past values.

Keywords:

remote sensing; chlorophyll estimation; regression; deep learning neural network

1. Introduction

Primary productivity refers to how energy is converted to organic substances. It usually occurs due to the absorption of sunlight, which has an important role to produce certain nutrients needed for the development of a plant. Primary productivity is usually measured by the increase in carbon dioxide or the output of oxygen. In the ocean a type of plant known as phytoplankton is one of the two ways primary productivity occurs in the ocean. Phytoplankton uses chlorophyll to absorb sunlight, in this case using photosynthesis. When the phytoplankton’s chlorophyll absorbs sunlight, carbon dioxide is combined with water which produces oxygen. Primary productivity is sometimes at risk due to dust deposition by river flow. Dust is usually important for plant productivity due to it having important nutrients such as iron. Due to river flow, scientists and researchers were skeptical because of the increase in dust deposition in the ocean. Researchers and scientists asked whether dust deposition was affecting chlorophyll levels in the Orinoco river. Chlorophyll prediction using deep learning was done from satellite ocean color images [1]. These predictions were done only for current values and do not forecast chlorophyll concentration into the future.

In this paper, we present multivariate regression analysis for predicting chlorophyll based on water level and dust. We then propose a deep learning architecture for chlorophyll forecasting using past levels of chlorophyll. Section 2 presents the methods, Section 3 the results and discussion, and Section 4 the conclusions.

2. Materials and Methods

Materials: The water altimetry, chlorophyll, aerosol optical depth, MERR II wet and dry deposition data are obtained from the website Giovanni [2]. AOD MODIS 0.55 um refers to the optical scattering of airborne atmospheric particles. MERR II dry dep refers to dust deposition in the Orinoco River. MERRA II wetdep refers to water deposition in the Orinoco river. River flow can affect the amount of dust and water that is deposited in the Orinoco river. The time series data were downloaded for dates from 7 April 2002 to 2 January 2022. The total number of data points in each time series was 153. The univariate and multivariate regression analysis are done in Microsoft Excel, and the deep learning Long Short-Term Memory (LSTM) architecture for chlorophyll forecasting was implemented in Matlab.

Methods: Linear regression, multivariate analysis, and deep learning neural network are used for the prediction of the chlorophyll level. Univariate analysis is the simplest form of analyzing data since it only involves one variable. The prediction of chlorophyll is done from employing water flow, AOD, wet or dry deposition as one independent variable. Multivariate analysis, which involved multiple forms of data sets and information, was also used to create the linear regression. Univariate analysis uses the equation y = mx + c where x refers to the independent variable, and y the dependent variable. The equation used for multivariate regression analysis is y = b₁x₁ + b₂x₂ + b₃x₃ + b₄x₄ + c. We used up to four independent variables, x₁ to x₄. For the univariate regression we used water flow, AOD, MERRA II dry dep or wet. Long and short-term memory (LSTM) is a deep learning neural network architecture commonly used for time series prediction or forecasting. LSTM is a type of recurrent neural network (RNN) that uses a hidden state vector to represent context based on prior inputs and outputs, to be considered along with the current state when generating an output. The output vector is produced after a series of transformations of the input vector. Because this is advantageous in terms of network accuracy, RNNs are useful for analyzing time-series data [3]. LSTM neural networks are a type of RNN that attempts to solve the “vanishing gradient” problem (very small gradients do not allow distant input nodes to be considered). The basic unit of an LSTM network is a memory cell, which has an input gate, an output gate, and a forget gate, which control information flow into the system. It contains a pointwise multiplication operation and a sigmoid neural net layer that assist the mechanism. The cell determines the fate of the information it holds. The memory cell is also called a ‘cell state’ which maintains its state over time. This is determined by an independent set of weights pertaining to the memory cell, which are adjusted by gradient descent and backpropagation. Figure 1 shows the structure of the LSTM cell. LSTM has feedback connections, and it can process the entire sequence of data, apart from single data points such as images. The LSTM equations are given in [4]. In this research, LSTM was used to predict chlorophyll concentration based on past values. LSTM architecture was trained with 90% of data and 10% was used for prediction. The total amount of data used by the LSTM was 137 samples for training and the remaining 16 for testing. It is more accurate than regular models and can be used for analyzing and predicting multiple complex data sets. For the univariate regression, 108 samples were selected randomly for estimation, and the remaining samples for prediction.

3. Experimental Results and Discussion

Figure 2 shows the output of linear regression with water flow into the Orinoco river as the independent variable, and chlorophyll as the dependent variable. The equation of the obtained line is: y = 0.0057x + 0.0509.

Figure 3 summarizes the regression and Analysis of Variance (ANOVA) analysis for the univariate model. The univariate analysis is done with each of the AOD, MERR II wet and dry deposition as independent variables. The summary outputs for each of them are given below in Figure 4, Figure 5 and Figure 6. We used nearest neighbor interpolation for filling the missing river flow values for the regression analyses. Figure 7 summarizes the output from regression and ANOVA analysis for the multivariate model with four independent variables.

We can see that the adjusted R² value was 0.0264 for chlorophyll prediction using MERRA II wet deposition. We also combined two to four maximum independent variables that resulted in a standard error of 0.0538. Figure 8 shows each of the time series data. Figure 9 shows the time series values for chlorophyll used for training and prediction using the LSTM. Figure 9a shows the chlorophyll time series, and Figure 9b is the predicted or forecast chlorophyll values. LSTMs are useful for making accurate predictions of a time series into the future. Figure 10 shows the training progress for the LSTM network. The network consisted of 100 neuron units in the hidden layer, used gradient descent for training with a learning rate of 0.005, and a piecewise learning rate. The maximum number of epochs was 250. The network consisted of four layers: sequence input layer, LSTM layer, fully connected layer, and regression layer.

Figure 11 gives the Root Means Square Error (RMSE) between the predicted and original values of chlorophyll concentration. The error was 0.045862 which was less than the standard error obtained by linear regression.

4. Conclusions and Future Work

We have performed univariate and multivariate regression analysis for chlorophyll prediction from river flow, AOD, wet and dry depositions. A new LSTM algorithm is presented for chlorophyll forecasting from observed values of chlorophyll alone. The LSTM model was not affected by the correlation between the variables, and its predictions were based on past values of chlorophyll concentration. However, the model can be modified to include more variables for chlorophyll forecasting and further reduce the RMSE. Further, the architecture can be improved with optimal network design parameters.

Author Contributions

Conceptualization, C.d.C. and S.S.; methodology, S.S. and C.d.C.; software, S.S.; validation, C.d.C., S.S. and V.M.; formal analysis, S.S.; investigation, C.d.C. and S.S.; resources, C.d.C.; data curation, S.S.; writing—original draft preparation, S.S. and V.M.; writing—review and editing, V.M.; visualization, S.S.; supervision, C.d.C.; project administration, C.d.C.; funding acquisition, C.d.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research did not receive any funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

https://giovanni.gsfc.nasa.gov/giovanni/ (accessed on 1 July 2022).

Conflicts of Interest

The authors declare no conflict of interest.

References

Jin, D.; Lee, E.; Kwon, K.; Kim, T. A deep learning model using satellite ocean color and hydrodynamic model to estimate chlorophyll-a concentration. Remote Sens. 2021, 13, 2003. [Google Scholar] [CrossRef]
NASA Data, E. Giovanni The Bridge between Data and Science v 4.37. Available online: https://giovanni.gsfc.nasa.gov/giovanni/ (accessed on 1 July 2022).
Che, Z.; Purushotham, S.; Cho, K.; Sontag, D.; Liu, Y. Recurrent Neural Networks for Multivariate Time Series with Missing Values. Sci. Rep. 2018, 8, 6085. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Sridhar, S.; Manian, V. Eeg and deep learning based brain cognitive function classification. Computers 2020, 9, 104. [Google Scholar] [CrossRef]

Figure 1. LSTM cell.

Figure 2. Univariate regression.

Figure 3. Analysis summary for chlorophyll prediction from river flow.

Figure 4. Analysis summary for chlorophyll prediction from AOD.

Figure 5. Analysis summary for chlorophyll prediction from MERRA II dry deposition.

Figure 6. Analysis summary for chlorophyll prediction from MERRA II wet deposition.

Figure 7. Chlorophyll prediction from river flow, AOD, wet and dry deposition.

Figure 8. Time series data obtained from Giovanni (Blue—Chlorophyll MODIS-A, Orange—AOD MODIS, Grey—MERRA II Dry deposition, yellow—MERRA II Wet deposition).

Figure 9. Chlorophyll time series forecasting (a) original times series, (b) time series with forecasted values.

Figure 10. Training curve for the LSTM.

Figure 11. RMSE for chlorophyll prediction using LSTM.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sridhar, S.; del Castillo, C.; Manian, V. Chlorophyll Estimation from Multivariate Regression Analysis and Deep Learning Using Remote Sensing Data. Eng. Proc. 2022, 27, 78. https://doi.org/10.3390/ecsa-9-13319

AMA Style

Sridhar S, del Castillo C, Manian V. Chlorophyll Estimation from Multivariate Regression Analysis and Deep Learning Using Remote Sensing Data. Engineering Proceedings. 2022; 27(1):78. https://doi.org/10.3390/ecsa-9-13319

Chicago/Turabian Style

Sridhar, Sriniketan, Carlos del Castillo, and Vidya Manian. 2022. "Chlorophyll Estimation from Multivariate Regression Analysis and Deep Learning Using Remote Sensing Data" Engineering Proceedings 27, no. 1: 78. https://doi.org/10.3390/ecsa-9-13319

Article Menu

Chlorophyll Estimation from Multivariate Regression Analysis and Deep Learning Using Remote Sensing Data^†

Abstract

1. Introduction

2. Materials and Methods

3. Experimental Results and Discussion

4. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Chlorophyll Estimation from Multivariate Regression Analysis and Deep Learning Using Remote Sensing Data †

Abstract

1. Introduction

2. Materials and Methods

3. Experimental Results and Discussion

4. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Chlorophyll Estimation from Multivariate Regression Analysis and Deep Learning Using Remote Sensing Data^†