Next Article in Journal
Symbiont-Bearing Colonial Corals and Gastropods: An Odd Couple of the Shallow Seas
Previous Article in Journal
Perception of Autonomy and the Role of Experience within the Maritime Industry
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:

Water Temperature Prediction Using Improved Deep Learning Methods through Reptile Search Algorithm and Weighted Mean of Vectors Optimizer

Rana Muhammad Adnan Ikram
Reham R. Mostafa
Zhihuan Chen
Kulwinder Singh Parmar
Ozgur Kisi
5,6,* and
Mohammad Zounemat-Kermani
School of Economics and Statistics, Guangzhou University, Guangzhou 510006, China
Information Systems Department, Faculty of Computers and Information Sciences, Mansoura University, Mansoura 35516, Egypt
Engineering Research Center for Metallurgical Automation and Measurement Technology of Ministry of Education, Wuhan University of Science and Technology, Wuhan 430000, China
Department of Mathematics, IKG Punjab Technical University, Kapurthala 144601, India
Department of Civil Engineering, Technical University of Lübeck, 23562 Lübeck, Germany
Civil Engineering Department, Ilia State University, 0162 Tbilisi, Georgia
Department of Water Engineering, Shahid Bahonar University of Kerman, Kerman 00076, Iran
Authors to whom correspondence should be addressed.
J. Mar. Sci. Eng. 2023, 11(2), 259;
Submission received: 27 November 2022 / Revised: 1 January 2023 / Accepted: 14 January 2023 / Published: 23 January 2023
(This article belongs to the Section Ocean Engineering)


Precise estimation of water temperature plays a key role in environmental impact assessment, aquatic ecosystems’ management and water resources planning and management. In the current study, convolutional neural networks (CNN) and long short-term memory (LSTM) network-based deep learning models were examined to estimate daily water temperatures of the Bailong River in China. Two novel optimization algorithms, namely the reptile search algorithm (RSA) and weighted mean of vectors optimizer (INFO), were integrated with both deep learning models to enhance their prediction performance. To evaluate the prediction accuracy of the implemented models, four statistical indicators, i.e., the root mean square errors (RMSE), mean absolute errors, determination coefficient and Nash–Sutcliffe efficiency were utilized on the basis of different input combinations involving air temperature, streamflow, precipitation, sediment flows and day of the year (DOY) parameters. It was found that the LSTM-INFO model with DOY input outperformed the other competing models by considerably reducing the errors of RMSE and MAE in predicting daily water temperature.

1. Introduction

Water temperature is a fundamental aspect impacting the quality of aquatic environments [1]. Increasing water temperature can boost chemical and biological processes that lead to different problems. For instance, when water temperature increases, dissolved oxygen reduces, and aquatic plants and pertinent aqua-systems are threatened [2,3]. In addition, it can affect the ice formation or evaporation rate [4,5]. Water temperature is also vital in assessing climate change in hydrological projects such as considering thermal dynamics during the construction of hydraulic structures such as dams [6,7,8]. As a result, developing a precise method to predict water temperature seems necessary to protect and manage aqua-systems.
Over the course of the past decades, several methodologies, such as mathematical models [9], stochastic methods [10], probabilistic techniques [11], statistical approaches [12], and remote sensing techniques [13], have been presented for predicting water temperature. More recently, machine learning (ML) methods have been successfully applied to model/predict water temperature [14].
In this sense, some studies expanded convectional ML models for modeling environmental problems such as water temperature [15,16,17,18,19,20]. For instance, three ML models were developed: a feed-forward neural network, a decision tree, and Gaussian process regression for estimating water temperature. The input vector of the developed model was air temperature, discharge, and time element (day of year) obtained from eight river stations. The models were evaluated by comparing the outcomes with the ones of aire2stream model, which is a hybrid model combining physical-based and stochastic calibration methods. In most cases, the results of the aire2stream model were superior to other proposed models based on several statistical indices. Moreover, the decision tree outperformed the other ML models, followed by the feed-forward neural network and Gaussian process regression models [21]. In another research, a convolutional neural network (CNN) was implemented to estimate the monthly subsurface water temperature in the Pacific Ocean. Input data consisted of sea surface height, sea surface temperature, and sea surface salinity measured with satellite remote sensors. In addition, the Argo data were used to validate the model’s results. The model enhanced the estimation and indicated promising outcomes for the satellite observation [22]. In a more recent study, the extended long short-term memory neural network (LSTM) capability was examined for predicting daily river temperature [19]. The data were collected from nine gauges in different rivers across the world. The proposed model demonstrated an accurate result in mapping water temperature even in the Yangtze River, the water temperature of which is induced by the operation gate.
Over the course of the past ten years, there has been a debate on the improvement of regular (single) ML models embedded with novel optimization strategies [23,24,25]. Subsequently, the accuracy of the ML models can be improved by integrating them with other soft computing approaches (e.g., heuristic algorithms) or optimization methods (such as swarm intelligence). In this sense, a back propagation neural network (BP) model was introduced and optimized through the particle swarm optimization algorithm (PSO) for predicting water temperature in the Yangtze River [26]. Three models were categorized by different input data, including BPPSO1 (air temperature as input), BPPSO2 (air temperature and discharge as input), and BPPSO3 (air temperature, discharge, and day of the year as input). The results showed that considering more input variables would lead to better prediction and help to model river thermal dynamic behavior more accurately. An LSTM model was established and optimized with a genetic algorithm (GA) for water temperature prediction to assess urban rivers’ water quality [27]. The hourly water temperatures for five years were considered as an input vector. The performance of the proposed model was compared with a recurrent neural network model. In conclusion, the GA-LSTM was introduced as an authentic deep machine learning technique for time-series predictions.
Among different nature-based optimization methods, the reptile search algorithm (RSA) is an innovative optimization algorithm proposed by Abualigah et al. [28]. The present technique is motivated by the hunting performance of reptiles and it is divided into two walking and hunting categories. Two mathematical models were proposed to update the position of crocodiles based on diverse and optimal search areas. The accuracy of this model was proven with seven real-world engineering criteria, twenty-three classical, thirty CEC2017, and ten CEC2019 test functions. The results were compared with ten optimization algorithms and showed the model’s superiority over other models. Several studies have been successfully developed based on RSA since then [29,30,31].
Recently, an optimization algorithm relying on the weighted mean of vectors (INFO) was presented. This model applies convergence acceleration and mean-base law to update the position of the vectors. Next, the updated vectors are combined to establish a possible solution. Then, local search is extended to enhance exploitation and ignore low-precise solutions. In their study, the developed model was evaluated with five constrained engineering test cases and forty-eight mathematical test functions. The results were also compared with five optimization algorithms and nine advanced algorithms. Moreover, the ability to use this algorithm in real engineering problems was verified by four complicated, challenging, and constrained issues. The findings indicated that the model is highly efficient in optimizing complex phenomena with unacknowledged search areas [32]. The INFO algorithm has been applied to estimate the solar cell parameters. The model was compared with seven optimization algorithms, including tunicate swarm algorithm, chimp optimization algorithm, moth–flame optimizer, Harris hawk optimization, grey wolf optimization, sine cosine algorithm, along with the Runge–Kutta optimization, and showed a higher performance [33].
The primary purposes of this study are fourfold: (1) to evaluate the individual machine learning models, including LSTM and CNN for modeling river water temperature; (2) to assess two recently introduced optimization algorithms for improving the accuracy of individual deep learning models; (3) to analyze the impact of each input variable including Ta (air temperature), Q (discharge), P (precipitation), S (sediment load) and time element (day of the year) on achieving high accuracy; and (4) to compare the performance of the proposed model with observed data, using various statistical approaches. To the best of our knowledge, the optimization of the LSTM and CNN models through the application of RSA and INFO algorithms has not been studied in water temperature prediction, which demonstrates the contribution of this research.

2. Case Study

Bailong River Basin is selected in this study as a case study area (Figure 1). Bailong River is a secondary tributary of the Yangtze River, while simultaneously being the main largest tributary of the Jialing River. Bailong Basin is located in north–central China, in the 32°36′–34°24′ N longitude and 103°00′–105°30′ E latitude coordinates. The basin covers a catchment area of 17,845 Km2 and the river spread is of 452 Km. The elevation of the Bailong River basin is higher in the northwest and lower in the southeast, varying from 520 to 4358 m. Rainfall is the main source of the Bailong River flow. The mean annual rainfall in the basin ranges from 520 mm to 910 mm, whereas the average annual temperature varies from 6.1 to 14.9 °C. A big portion of the yearly rainfall, i.e., 60% to 80%, occurs from June to September in the form of heavy rainfall events. Wudu hydroclimatic station is adopted in this study to model the water temperature of the Basin. The Wudu station is situated around 33°23′ N longitude and 104°55′ E latitude coordinates. Daily streamflow, water temperature, precipitation, and sediment data were collected from the yearly hydrological books, whereas the daily air temperature data were obtained from the China Meteorological Administration (CMA) department for the period from 2007 to 2012. Different input combinations of air temperature, streamflow, precipitation, and sediment flows were adopted to model the water temperature of the basin. In addition to these inputs, day of the year (DOY) input was also added to see the effect of this input on the prediction accuracy of water temperature modeling. For better representation of data, cross-validation techniques were applied. A total of six years’ worth of datasets were divided into three equals (two years) sub-datasets, i.e., from M1 to M3. Each time, one sub-dataset was adopted as a test dataset, whereas the remaining four years’ data were used as a training dataset. The statistical overview of the dataset is depicted in Table 1.

3. Methods

In this study, the performances of two deep learning models optimized using two recently introduced optimization algorithms were compared in predicting daily water temperature time series. A basic description of both deep learning methods and optimization algorithms is provided below.

3.1. Deep Learning Based Models

3.1.1. Convolution Neural Network (CNN)

The convolution neural network model is widely used around the globe in many areas of research and it is also known as ConvNet. It is a type of artificial neural network (ANN) with the ability to feed forward. CNN has a finite number of layers, which establish many properties of input data with multiple levels of abstraction. Initially, in 1959, David Hubel and Torsten Wiesel determined that the neurons of a cat’s brain form in the layers. These are developed to identify visual patterns by, first, extracting the local structures, and then combining the extracted structures for advanced-level depiction. After this development, this became a vital principle for deep learning. Later on, David Hubel and Torsten Wiesel applied this technique to monkeys to established extracellular components from sole elements and to stimulate the retinas with spots. They also published their papers, entitled “Receptive fields of single neurons in cat’s striate cortex” [34] and “Receptive fields and functional architecture of monkey striate cortex” [35]. After this, Fukushima [36] developed recognition, which has multiple layers with a self-organizing neural network and a capacity of finding visual patterns. This construction is the first model of CNN. The conceptual model of CNN is described below in Figure 2.
Input Layer: The major purpose of this layer is to feed the inputs to the convolution layer.
Convolution Layer: This layer is the most vital part in the structure of CNN, as it consists of the convolution kernels which help to convert loaded inputs to generate an output. It extracts the features from the input data and uses them for further processing in CNN. This layer is important for two major reasons: sparse connection and weight sharing. In the first part, a small number of weights is provided to connect two layers. On the other hand, in the neural network, it is required that each neuron of one layer connect with each neuron of the other layer. Therefore, in CNN, the amount of memory stored in these weights is also small, which makes this model memory-efficient. Secondly, in weight sharing, as all weights work with every input, it takes one set of weights for all inputs—this process reduces the training time and cost.
Pooling Layer: This layer uses a bigger part of the feature map and shrinks that to a lower-size feature map. In each pool, it keeps all the important features of the previous convolution layer. There are many practices for pooling, such as max pooling, average pooling, min pooling, tree pooling, and gated pooling.
Fully Connected Layer: In this fully connected layer, all neurons are mapped with all neurons from their past layers. This is the last stage of each convolutional neural network, utilized as a regression. It is a combination of two laws, firstly as feed-forward ANN, and secondly as MLP (multilayer perceptron neural network).
Activation Functions: The activation function is used to make an association between input and output; it also instructs the neuron on when to fire or not. Here, in the CNN model, a non-linear activation layer is used, which helps to manage the mapping nonlinearly between inputs to outputs. In the training session of the model, it is used to reduce errors while using the backpropagation method.
Sigmoid function: This sigmoid function has the real input and fixes the output in the range of [0, 1]. The sigmoid function’s curve is ‘S’-shaped and it is shown mathematically as:
f ( x ) s i g m = 1 1 + e x
Tanh: This activation function is utilized to fix the input values in [−1, 1]; mathematically, it is represented as:
f ( x ) tanh = e x e x e x + e x
ReLU: In CNN, this rectifier linear unit (ReLU) [37] is widely used as an activation function. It translates every input value to a positive number with a nominal computation load. It is mathematically described as:
f ( x ) Re L U = M a x ( 0 , x )

3.1.2. Long Short Term Memory (LSTM)

The long short-term memory network (LSTM) model is very applicable in different areas of research and is developed based on the error flow of recurrent neural networks (RNN). Its major contribution is the use of temporal information for the input to the model; this makes the LSTM model unique compared to the other machine learning models. This developed LSTM model contains the memory cell, which has a particular neuron structure. The LSTM model also has the expertise to supply the information at a random time. The input gate, output gate, and forget gate are the three main gates of the neuron’s memory cell. In addition, each gate has the same input neuron, and each gate also has its activation function. Long delays are inaccessible to existing structures because the back-propagated mistake either explodes or rots dramatically. These squares can be considered an idea for a differentiable form of the memory that contributes to an advanced PC. This contribution to the cells is increased by the initiation of the information input, the output is duplicated by that of the yielding door, and the past cell esteems are increased by the fail-to-remember entryway [38,39,40,41]. Here, the cells and gates are further collaborating with the net. The LSTM model’s flow chart is shown in Figure 3.

3.2. Optimization Algorithms

3.2.1. Weighted Mean of Vectors (INFO) Optimizer

In the area of population-based modeling, the weighted mean of vectors (INFO) optimizer is a popular optimization model and known for its accuracy in the model output. The weighted mean is calculated from input vectors that have the possible solution set in the search space. The average of the positions evaluated is built upon the weights of every vector. The large weighted positions of the set of vectors play a leading role in the calculations of the weighted mean. The weighting of the positions helps to find out the outcome of the solutions on the weighted mean. This algorithm provides the optimal solution over many consecutive generations. In every generation, three operators help in updating the position of the vector, as discussed below in the stages of position update [32].
Stage 1 (Updating Rule): The first step in the INFO model is the updating rule, which enhances the population diversity in the search practice. This updating rule operator’s job is to utilize the weighted mean of the vector to construct new vectors. It has two major roles in the development of the INFO model: The first is the mean-based rule, which is obtained from the weighted mean (WM) for a set of random vectors. The second role is convergence acceleration (CA), which is used to escalate the convergence speed and performance of the model to get an optimal solution.
Let the model have the population of Np vectors in D dimensional search domain ( X l , j g = { X l , 1 g , X l , 2 g , X l , 3 g , , X l , D g } ,   l = 1 , 2 , 3 , , N p ). In the initial step, controlling parameters are introduced, σ is the weighted mean factor, β is the change according to the exponential function, the maximum number of generations is defined by M a x   g and δ is the scaling factor.
δ = 2 β × r a n d β β = 2 exp ( 4 × g M a x   g )
The mean-based rule which leads to the mean rule is defined below in Equation (5).
M e a n R u l e = r × W M 1 l g + ( 1 r ) × W M 2 l g ,     l = 1 , 2 , 3 , , N p
W M 1 l g = δ × w 1 ( x a 1 x a 2 ) + w 2 ( x a 1 x a 3 ) + w 3 ( x a 2 x a 3 ) w 1 + w 2 + w 3 + ε + ε × r a n d ,   l = 1 , 2 , 3 , , N p
w 1 = cos ( ( f ( x a 1 ) f ( x a 2 ) ) + π ) × exp ( | f ( x a 1 ) f ( x a 2 ) ω | )
w 2 = cos ( ( f ( x a 1 ) f ( x a 3 ) ) + π ) × exp ( | f ( x a 1 ) f ( x a 3 ) ω | )
w 3 = cos ( ( f ( x a 2 ) f ( x a 3 ) ) + π ) × exp ( | f ( x a 2 ) f ( x a 3 ) ω | )
ω = max ( f ( x a 1 ) , f ( x a 2 ) , f ( x a 3 ) )
In the above equations, the wavelet function is implemented for combining the translations and dilations of the selected mother wavelet, which helps to generate effective fluctuations for evaluating the vector’s weight. Equation (5a), depicts the general calculations of the weighted mean for the three vectors ( x a 1 , x a 2 , x a 3 ) given in Equations (5b)–(5d), respectively. f ( x a 1 ) , f ( x a 2 ) , f ( x a 3 ) are the fitness functions and w 1 , w 2 , w 3 are the weight functions of the vectors.
W M 2 l g = δ × w 1 ( x b s x b t ) + w 2 ( x b s x w s ) + w 3 ( x b t x w s ) w 1 , w 2 , w 3 + ε + ε × r a n d ,     l = 1 , 2 , 3 , , N p
w 1 = cos ( ( f ( x b s ) f ( x b t ) ) + π ) × exp ( | f ( x b s ) f ( x b t ) ω | )
w 2 = cos ( ( f ( x b s ) f ( x w s ) ) + π ) × exp ( | f ( x b s ) f ( x w s ) ω | )
w 3 = cos ( ( f ( x b t ) f ( x w s ) ) + π ) × exp ( | f ( x b t ) f ( x w s ) ω | )
ω = f ( x w s )
Here, f ( x ) is considered an objective function and integers taken from [1, Np] as a 1 a 2 a 3 l . The solutions of gth generation which are taken as the best, better, and worse solutions are defined as x b s , x b t , x w s , respectively. The weight functions are utilized to search the solution space globally for the INFO model with taken r as a random number in [0, 0.5].
Convergence acceleration (CA) is vital for the rule operator (stage 1), as it has utilized the best vector to move ahead as the current vector for global search. In this model, the best solution is the nearest solution to global optima. CA assists the vectors to move in the improved direction [42].
C A = r a n d   n   × ( x b s x a 1 ) ( f ( x b s ) f ( x a 1 ) + ε )
Here, a random number with normal distribution is denoted by rand n. In addition, the required new vector is obtained using Equation (8), as shown below.
z l g = x l g + σ × M e a n R u l e + C A
For the exploration phase, the model searches globally to discover the search domain. The following equations are used to update rule-based for x b s , x b t , x l g , x a 1 g .
i f   r a n d < 0.5 z 1 l g = x l g + σ × M e a n R u l e + r a n d   n × ( x b s x a 1 g ) ( f ( x b s ) f ( x a 1 g ) + 1 )
z 2 l g = x b s + σ × M e a n R u l e + r a n d   n × ( x a 1 g x b g ) ( f ( x a 1 g ) f ( x b g ) + 1 )
e l s e z 1 l g = x a g + σ × M e a n R u l e + r a n d   n × ( x a 2 g x a 3 g ) ( f ( x a 2 g ) f ( x a 3 g ) + 1 )
z 2 l g = x b t + σ × M e a n R u l e + r a n d   n × ( x a 1 g x a 2 g ) ( f ( x a 1 g ) f ( x a 2 g ) + 1 ) e n d
Here, in the gth generation, the newly obtain vectors are z 1 l g   &   z 2 l g , and the scaling rate of the vector is proposed as σ and defined as given below in Equation (13). Moreover, Equation (14) depicts the change based on α in exponential function.
σ = 2 α × r a n d α
α = c   exp ( d × g M a x   g )
Here, c = 2 and d = 4 are the constant numbers, the large value of the parameter σ indicates that the current position of the vector deviates from WM (exploration phase), and the small value directed that the current position vector move toward the WM (exploitation phase).
Stage 2 (Vector Combining): Vector-combining is the second stage, which is responsible for enhancing the population diversity and improving the local search. z 1 l g   &   z 2 l g are combined with the rule operator’s vector x l g for rand < 0.5 to generate a new vector u l g . This vector will ensure the improvement of the local search ability with the new and capable vector.
I f   r a n d < 0.5 u l g = z 1 l g + μ | z 1 l g z 2 l g |
e l s e u l g = z 2 l g + μ | z 1 l g z 2 l g | e n d
e l s e u l g = x l g e n d
Here, vector-combining in gth generation u l g is attained, μ = 0.05 × r a n d   n .
Stage 3 (Local Search): The third step in the INFO model is local search; it helps to encourage exploitation and convergence to attain the global best. The global best position is denoted here as x b e s t g and the novel vector will be around this position.
i f   r a n d < 0.5 u l g = x b s + r a n d   n × ( M e a n R u l e + r a n d   n × ( x b s g x a 1 g ) )
e l s e u l g = x r n d + r a n d   n × ( M e a n R u l e + r a n d   n × ( v 1 × x b s v 2 x r n d ) ) e n d
Here, v 1   &   v 2 are two random numbers, which are defined as below
v 1 = { 2 × rand         if   p > 0.5 1                                     otherwise
v 2 = { rand         if   p > 0.5 1                         otherwise
and   x r n d = ϕ × x a v g + ( 1 ϕ ) × ( ϕ × x b t + ( 1 ϕ ) × x b s )
x a v g = ( x a + x b + x c ) 3
ϕ depicts the random number from (0, 1), x r n d   is a new solution, which is evaluated by combining the components of the solutions   x a v g ,   x b t ,   x b s .

3.2.2. Reptile Search Algorithm (RSA)

RSA model is a popular modern nature-inspired model known as the reptile search algorithm (RSA); it is a metaheuristic algorithm inspired by the crocodiles’ encircling and hunting behavior. It was proposed in 2021 by [28] for the gradient-free algorithm with the generation of random solutions from the following equation [43,44].
x i , j = r a n d [ 0 , 1 ] × ( U B j L B j ) + L B j             f o r   i { 1 , 2 , , N }     a n d     j { 1 , 2 , , M }
Here, x i , j is the ith solution of the jth input for N solutions with M features, r a n d [ 0 , 1 ] is a randomly produced number with uniform distribution ranging [ 0 , 1 ] ( 0 , 1 ) and the jth property with U B j , L B j taken as upper and lower boundaries.
As in all nature-inspired models, the RSA model also consists of exploration and exploitation principles. These two rules follow the crocodiles’ behavior for encircling the prey target for the RSA model. To adopt the crocodiles’ behavior, a four-iterations process has been considered. The first two stages of iterations are utilized on the basis of exploration for encompassing performance, including the high and the belly walking arrangements. The crocodiles start orbiting to hunt the area, enabling a further comprehensive hunt of the solution’s space. This encircling process can be modeled mathematically as:
x i , j ( g + 1 ) = { [ η i , j ( g ) γ B e s t j ( g ) ] [ rand { 1 , N ] R i , j ( g ) ] ,         for g T 4 E S ( g ) B e s t j ( g ) x ( r a n d [ 1 ,   N ] , j ) ,                                                                 for g 2 T 4   a n d   g > T 4
Here, for the jth feature, B e s t j ( g ) is the finest result; with ith solution, n i , j is the hunting operator; to control the accuracy of exploration, γ is used; R i , j ( g ) is used to compact and optimize search space; r a n d [ 1 , N ] is used to randomize possible candidate solutions and E S ( g ) used to reduce the probability ratio.
n i , j = B e s t j ( g ) × P i , j
P i , j is the fraction variance among the jth value of the finest result and the corresponding value in the current solution; it is evaluated below.
P i , j = θ + x i , j M ( x i ) B e s t j ( g ) × ( U B j L B j ) + ε
θ represents a sensitive factor to switches in the exploration performance; M ( x i ) denotes the mean solutions, and ε depicts small floor value, evaluated as below:
M ( x i ) = 1 n j = 1 n x i , j
R i , j = B e s t j ( g ) x ( rand ε [ 1 , N ] , j ) B e s t j ( g ) + ε
E S ( g ) = 2 × rand [ 1 , 1 ] × ( 1 1 T )
Here, the multiple with 2 is used to deliver correlation values in [0, 2], and r a n d [ 1 , 1 ] is arbitrary in [–1, 1].
In last phase of iterations, the RSA model employs exploitation to search the solution’s space to find an optimal solution, which is directly dependent upon hunting coordination and cooperation. The below equations are used to update the optimal solution.
x i j ( g + 1 ) = { [ rand ϵ { 1 , 1 ] Best j ( g ) P i j ( g ) ] ,                                                   for g 3 T 4   and   g > 2 T 4 [ ε Best j ( g ) n i j ( g ) ] [ rand ϵ { 1 , 1 ] R i j ( g ) ] ,         for g T   and   g > 3 T 4
The performance of entrant solutions, obtained from every iteration, is obtained using pre-defined FF and the model stops the performance after the T iteration with the least fitness value (OFS). The path of the RSA model is described in Figure 4.

3.3. Improved Deep Learning Models

3.3.1. Improved CNN Model

To improve the performance of the CNN model, its hyperparameters were optimized using the MHs algorithm, namely RSA and INFO. To this end, first, the hyperparameters of 1D CNN were optimized using MHs based on training data. Then, the trained 1D CNN was used to predict unknown values in test data.
One-dimensional CNN has several hyperparameters, and its performance depends on their optimal values. The goal of using the MH algorithm in the proposed method is to find optimal deals for these hyperparameters, which leads to improved prediction accuracy. To this end, we consider five critical hyperparameters in the proposed method, including the number of filters, kernel size, number of epochs, batch size, and pooling size. These hyperparameters and their ranges are listed in Table 2. Thus, each solution in the population space of this MHs algorithm contains five values corresponding to the considered parameters.
The optimization procedure has an initialization step in which several individuals are randomly produced as the positions. The number of places equals the number of CNN hyperparameters optimized using the MH algorithm. After this step (initialization), the searching process of the MH algorithm is repeated and the first populations (new generations) are computed to reach the best solution. This solution provides the optimal values of CNN hyperparameters. To assess the accuracy of each solution, the fitness function (FF) is applied. Here, the RMSE was employed as a FF.
After implementing the MH algorithm, the best solution (optimal CNN hyperparameters) is obtained. Then, the final optimized hyperparameter sets are passed to CNN to evaluate prediction ability. The general procedure of the CNN-MHs is provided in Figure 5.

3.3.2. Improved LSTM Model

In this section, metaheuristic algorithms (MHs) are employed to develop a more accurate LSTM model (Figure 6). Two MHs, RSA and INFO, were used here to optimize the learning rate ( ) and hidden neurons ( ). This inclusion of RSA and INFO enhances the prediction accuracy more than the single LSTM model. Here, the dataset is divided into two groups: the training set (70%) and the testing set (30%). The proposed approach generates a solutions set involving the parameters of the LSTM. The fitness function RSME is then applied, which is formulated as below:
R M S E = 1 n i = 1 N ( a i p i ) 2
where ai is the actual value, and pi is the predicted value. After that, the solutions were updated according to the search mechanisms of the MHs, as described in the previous section. The best solution is the one that obtains the smallest RMSE value. The process of updating solutions is repeated until the terminal condition is met. In addition, the quality of the proposed LSTM-MHs is tested using 30% of the sample dataset (testing set) by employing the evaluation metrics to compute the prediction output.
The specific steps for using MHs to optimize LSTM parameters are as follows:
Step 1:
Split the input data into two groups to train and test the method.
Step 2:
Initialize the relevant parameters: population size (N), the maximum number of iterations (tmax), the upper and lower bounds of the search space ub and lb, respectively, and the range of LSTM parameters (h, α).
Step 3:
Generate the initial population of the solution.
Step 4:
Calculate the fitness value of each solution using LSTM training.
Step 5:
Use the MHs algorithm to optimize the hyperparameters of LSTM by exploring the search domain.
Step 6:
Use LSTM objective function to evaluate each candidate parameter.
Step 7:
The process is iterated until the maximum number of iterations is reached.
Step 8:
The final optimized set of hyperparameters is then passed to ELM to evaluate prediction ability.

4. Model Parameters and Accuracy Assessment

The following statistics were utilized for the assessment of the methods:
RMSE : Root Mean Square Error = 1 N i = 1 N [ ( Y 0 ) i ( Y C ) i ] 2
MAE : Mean Absolute Error = 1 N i = 1 N | ( Y 0 ) i ( Y C ) i |
NSE : Nash Sutcliffe Efficiency = 1 i = 1 N [ ( Y 0 ) i ( Y c ) i ] 2 i = 1 N [ ( Y 0 ) i Y ¯ 0 ] 2 , < NSE 1
R 2 : Determination Coefficient = [ t = 1 N ( Y o Y o ¯ ) ( Y c Y c ¯ ) t = 1 N ( Y o Y o ¯ ) 2 ( Y c Y c ¯ ) 2 ] 2
where Y c ,   Y o ,   Y ¯ o ,   N are calculated, observed, mean water temperature and average of the observed streamflow and number of data, respectively. The parameters of both algorithms are listed in Table 3. For each algorithm, 30 populations and 100 iterations were used, and models were run 30 times to be able to obtain more robust outcomes. Data were split into three parts, M1, M2 and M3, and at each time, one part was used as testing and the other two used for training. Thus, the training and testing procedures scanned all the data.

5. Results and Discussion

This section provides the outcomes of the improved deep learning methods, convolutional neural networks (CNN) and long short-term memory (LSTM) network using the reptile search algorithm (RSA) and weighted mean of vectors optimizer (INFO) in predicting water temperature using different input combinations of air temperature, streamflow, precipitation, sediment flows and day of the year (DOY).

5.1. Results

Training and testing results of the single CNN models in water temperature prediction are summed up in Table 4 for three different datasets and nine different input combinations. It is observed from the table that the CNN model offers the best accuracy for input combination ix involving all input parameters in M1 and M3 datasets, while input combination viii, with air temperature and day of the year, provided the lowest RMSE (1.441), MAE (1.165) and the highest R2 (0.937), NSE (0.933) in predicting water temperature in the test period. Training statistics also keep the same trend. In all three subsets (M1, M2 and M3), adding DOY input considerably improved the model performance. For example, in the M2 set, adding DOY to the input combination vii improved the RMSE and MAE by 4.68% and 4.97%, and adding DOY to the input combination I in the M2 set improved the RMSE and MAE of the CNN model by 7.57% and 6.65% in the test period. As expected, the air temperature is the most effective factor influencing water temperature; therefore, it was kept in all input combinations.
Table 5 lists the training and testing statistics of the hybrid CNN-RSA models in predicting water temperature. In the M1 and M2 datasets, the input combination ix had the best accuracy, while the model produced the lowest RMSE (1.266), MAE (0.976) and the biggest R2 (0.947), NSE (0.943) for the input combination viii in the test period. For this method (CNN-RSA), the DOY input also improves the prediction accuracy; improvement in RMSE is 4.86% from input combinations vii to ix in M1 set, while the corresponding percentage is 10.59% from input combinations i to viii in M3 set in the test period. The outcomes of the hybrid CNN-INFO models are summed up in Table 6 in predicting water temperature. In all three subsets (M1, M2 and M3), the CNN-INFO model had the lowest RMSE and MAE and the highest R2 and NSE for input combination viii involving air temperature and DOY parameters. Importing DOY into the model input improves the RMSE and MAE accuracies of the CNN-INFO (compare the input combinations i and viii) by 15.3% and 13% for M1, by 8.71% and 10.3% for M2, and by 12.2% and 13.5% for M3 in the test period.
Training and testing outcomes of the single and hybrid LSTM models in predicting water temperature are reported in Table 7, Table 8 and Table 9. It is visible from Table 7 that the single LSTM models with inputs of Ta, P, Q, S, DOY (input combination ix) offered the best accuracy in all subsets (M1 to M3). On the other hand, the hybrid LSTM-RSA models performed the best for the inputs of air temperature and DOY (input combination viii) in all testing subsets. For the LSTM-INFO, the model with Ta and DOY outperformed the other alternatives in M1 and M3, while in M2, the model acted as the best for the full inputs (input combination ix) in predicting water temperature. Similar to the CNN-based models, the accuracy of the LSTM model was also improved by adding DOY to the input combinations
Comparison of input combinations vii and ix reveals that adding DOY into the model inputs decreases the RMSE and MAE of LSTM by 1.82% and 2.4% in M1, by 4.4% and 2.83% in M2, and 0.81% and 1.22% in M3 in the test period. It is clear from Table 8 and Table 9 that the positive effect of DOY in the prediction performance of the hybrid LSTM models is higher. For example, the RMSE and MAE of LSTM-RSA decrease by 23.1% and 25% from input combination i to viii in the M2 testing dataset. The corresponding improvements in RMSE and MAE of LSTM-INFO are 21.8% and 18.9% for the M1 dataset (Table 3).
Table 10 lists the training and testing accuracies of the single and hybrid CNN and LSTM models in predicting water temperature. The best models with the least RMSE, MAE and the biggest R2 and NSE were selected for comparison in this table. It is apparent from the mean statistics in Table 9 that the LSTM-based models outperform the CNN-based models in water temperature prediction in all three testing datasets. It is also clear that tuning parameters of deep leaning models, CNN and LSTM, with RSA and INFO considerably improves their accuracy. The INFO algorithm is generally superior to the RSA in improving deep leaning models in water temperature prediction.
Figure 7 illustrates the scatterplots of the observed and predicted water temperatures using different CNN- and LSTM-based models in the test period. It is visible from the graphs that the hybrid CNN and LSTM models have less scattered predictions and higher R2 compared to single models, and that the LSTM-INFO has the best predictions. It is clear from the Taylor diagram provided in Figure 8 and Figure 9 that the hybrid INFO algorithm-based LSTM and CNN model has the lowest RMSE and highest correlation, and its standard deviation is closer to the observed one compared to other alternative models. Single models are weaker than the hybrid deep learning models. It is visible from the violin charts given in Figure 10 and Figure 11 that the LSTM-INFO and CNN-INFO has closer distribution to that of the observed water temperature, indicating the capability of this method among the implemented methods.

5.2. Discussion

It was observed from the outcomes of the overall results that both the RSA and INFO algorithms considerably improved the deep learning methods in the prediction of water temperature; improvements in RMSE of the best CNN model for M3 dataset are 9.25% and 11.04% by applying RSA and INFO in the test period, and the corresponding percentages were 17.9% and 20.3% for the LSTM model in the same dataset. The LSTM-based models provided better efficiency than the CNN-based models, and the LSTM-INFO generally acted as the best model in the prediction of water temperature, improving the prediction accuracy of CNN, CNN-RSA, CNN-INFO, LSTM and LSTM-RSA models by 9.9%, 16.7%, 7.65%, 19.8% and 17.4% in the test period, respectively. The outcomes reveal the need for new metaheuristic algorithms in tuning hyperparameters of deep leaning methods (e.g., CNN and LSTM) for accurately predicting water temperature.
This was observed from the models’ results for various input combinations, considering DOY is very effective parameter and the prediction accuracy is considerably improved by involving this in the inputs. However, adding some input parameters deteriorates the prediction accuracy of the deep leaning models in all three testing datasets. For example, the RMSE and MAE of the CNN model increase from 1.652 and 1.289 to 1.689 and 1.302 by adding the precipitation parameter into the air temperature input (see input combinations i and ii in Table 4).
According to the previous studies [45,46], better prediction accuracy is not guaranteed by increasing the input number and, in some cases, this process may negatively affects the variance and may lead to more complicated models that provide a lower prediction accuracy. Another important piece of information that can be drawn from the outcomes of the presented study is the use of three different testing datasets. It was observed that the accuracies of the implemented models are highly variable with respect to three testing datasets. For example, the mean RMSE of CNN model varies from 1.643 to 1.418 between the M1 and M3 datasets, and this range is 1.609–1.390 for the LSTM model. A similar variation is also observed for the hybrid deep learning methods. All these suggest the use of different datasets in assessing machine learning methods and that the use of only one training–testing datasets, as is generally applied in the literature, can mislead the modeler.
Similar to the previous works [47,48], the outcomes of the present study revealed that the air temperature is the most important parameter in predicting water temperature. Our study also has direct agreement with those of Webb et al. [47] and Sohrabi et al. [49], as regards the discharge being the second most important parameter. The main drawback of the proposed methods is their data-driven nature. They cannot consider the impacts of possible changes in vegetation or other variables on stream water temperature, and if the testing data are far from the calibration period, the proposed methods may not work well. In such cases, the methods should be re-calibrated by taking into account the new variables which involve the quantitative information of the possible changes.
Zhu et al. (2018) [19] compared the accuracy of ANN and ANFIS in predicting river water temperature of three stations using air temperature, river flow discharge and the components of the Gregorian calendar as inputs. They obtained R2 of 0.954 from the best model (ANN) as 0.954, 0.943 and 0.976 for three stations. Grbic et al. (2013) [11] investigated the prediction of water temperature of the Drava river in Croatia. They used inputs of air temperature, river flow discharge and the best model produced the NSE of 0.984. The hybrid models investigated in the presented study also provided high R2 and NSE values (higher than 0.900). The lower accuracy of the improved CNN and LSTM models might be due to the fact that the previous studies did not consider cross-validation as we de did in our study, which causes a decrease in accuracy but provides more robust analysis.

6. Conclusions

Water temperature plays a key role in the precise management of river ecology and in keeping the balance between the ecosystems of aquatic life. In this study, two deep learning methods, namely convolutional neural networks (CNN) and long short-term memory (LSTM) network models, were evaluated to predict the daily water temperatures data of the Bailong River in China. To improve the prediction accuracy of both deep learning models, two recently developed optimization algorithms, i.e., the reptile search algorithm (RSA) and weighted mean of vectors optimizer (INFO), were utilized. Different input combinations of air temperature, streamflow, precipitation, sediment flows and day of the year (DOY) were analyzed on the basis of cross-validation to model water temperature data and the accuracy of modeling was accessed based on the root mean square errors (RMSE), mean absolute errors, determination coefficient and Nash–Sutcliffe efficiency statistics. It was found that a DOY input combination and INFO-based CNN provided more accurate results in comparison with other CNN based models. The RMSE and MAE accuracies of the other CNN-based models improved by 15.3–8.71% and 13.5–10.3% for all datasets through the use of DOY input combination in the CNN-INFO model. For the LSTM-based models, it was also found that INFO-based LSTM model outperformed the other competing LSTM-based model; however, inputs of air temperature and DOY yield more precise results in comparison with other inputs combinations. The LSTM-INFO model using Ta and DOY inputs reduced the RMSE and MAE errors of the other LSTM-based models by 25.1–21.8% and 23.4–18.9% for all datasets. It was also found that overall LSTM-INFO produced more precise results in comparison with CNN-INFO models. Proposed methods can be utilized to model other hydrological variables. They can also be used with decomposition techniques to capture noise in data in order to improve further prediction accuracy of the models.

Author Contributions

Conceptualization: R.M.A.I., O.K. and R.R.M.; formal analysis: R.M.A.I. and R.R.M.; validation: O.K., Z.C., R.R.M., K.S.P., M.Z.-K. and R.M.A.I.; supervision: O.K. and Z.C.; writing—original draft: O.K., R.R.M., K.S.P., M.Z.-K. and R.M.A.I.; visualization: R.M.A.I. and R.R.M.; investigation: O.K., M.Z.-K. and K.S.P. All authors have read and agreed to the published version of the manuscript.


This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study will be available on reasonable request from the corresponding author.

Conflicts of Interest

There are no conflict of interest in this study.


  1. Caissie, D. The thermal regime of rivers: A review. Freshw. Biol. 2006, 51, 1389–1406. [Google Scholar] [CrossRef]
  2. Sahoo, G.; Schladow, S.; Reuter, J. Forecasting stream water temperature using regression analysis, artificial neural network, and chaotic non-linear dynamic models. J. Hydrol. 2009, 378, 325–342. [Google Scholar] [CrossRef]
  3. Bernhardt, E.S.; Heffernan, J.B.; Grimm, N.B.; Stanley, E.H.; Harvey, J.W.; Arroita, M.; Appling, A.; Cohen, M.J.; McDowell, W.H.; Hall, R.O.; et al. The metabolic regimes of flowing waters. Limnol. Oceanogr. 2017, 63, S99–S118. [Google Scholar] [CrossRef] [Green Version]
  4. Wanders, N.; Wada, Y. Human and climate impacts on the 21st century hydrological drought. J. Hydrol. 2015, 526, 208–220. [Google Scholar] [CrossRef]
  5. Wanders, N.; Van Vliet, M.T.H.; Wada, Y.; Bierkens, M.F.P.; Van Beek, L.P.H. High-Resolution Global Water Temperature Modeling. Water Resour. Res. 2019, 55, 2760–2778. [Google Scholar] [CrossRef]
  6. Liu, S.; Xu, L.; Li, D. Multi-scale prediction of water temperature using empirical mode decomposition with back-propagation neural networks. Comput. Electr. Eng. 2016, 49, 1–8. [Google Scholar] [CrossRef]
  7. Cai, H.; Piccolroaz, S.; Huang, J.; Liu, Z.; Liu, F.; Toffolon, M. Quantifying the impact of the Three Gorges Dam on the thermal dynamics of the Yangtze River. Environ. Res. Lett. 2018, 13, 054016. [Google Scholar] [CrossRef]
  8. Du, X.; Shrestha, N.K.; Wang, J. Assessing climate change impacts on stream temperature in the Athabasca River Basin using SWAT equilibrium temperature model and its potential impacts on stream ecosystem. Sci. Total. Environ. 2019, 650, 1872–1881. [Google Scholar] [CrossRef]
  9. Sartori, E. A Mathematical Model for Predicting Heat and Mass Transfer from a Free Water Surface. In Advances in Solar Energy Technology; Pergamon: Oxford, UK, 1988; pp. 3160–3164. [Google Scholar] [CrossRef]
  10. Caissie, D.; El-Jabi, N.; St-Hilaire, A. Stochastic modelling of water temperatures in a small stream using air to water relations. Can. J. Civ. Eng. 1998, 25, 250–260. [Google Scholar] [CrossRef]
  11. Grbić, R.; Kurtagić, D.; Slišković, D. Stream water temperature prediction based on Gaussian process re-gression. Expert Syst. Appl. 2013, 40, 7407–7414. [Google Scholar] [CrossRef]
  12. Parmar, K.S.; Bhardwaj, R. Water quality management using statistical analysis and time-series predic-tion model. Appl. Water Sci. 2014, 4, 425–434. [Google Scholar] [CrossRef] [Green Version]
  13. Tiyasha, T.; Tung, T.M.; Bhagat, S.K.; Tan, M.L.; Jawad, A.H.; Mohtar WH, M.W.; Yaseen, Z.M. Func-tionalization of remote sensing and on-site data for simulating surface water dissolved oxygen: Development of hy-brid tree-based artificial intelligence models. Mar. Pollut. Bull. 2021, 170, 112639. [Google Scholar] [CrossRef] [PubMed]
  14. Quan, Q.; Hao, Z.; Xifeng, H.; Jingchun, L. Research on water temperature prediction based on improved sup-port vector regression. Neural Comput. Appl. 2022, 34, 8501–8510. [Google Scholar] [CrossRef]
  15. Alizamir, M.; Kisi, O.; Adnan, R.M.; Kuriqi, A. Modelling reference evapotranspiration by combining neuro-fuzzy and evolutionary strategies. Acta Geophys. 2020, 68, 1113–1126. [Google Scholar] [CrossRef]
  16. Yuan, X.; Chen, C.; Lei, X.; Yuan, Y.; Muhammad Adnan, R. Monthly runoff forecasting based on LSTM–ALO model. Stoch. Environ. Res. Risk Assess. 2018, 32, 2199–2212. [Google Scholar] [CrossRef]
  17. Ikram, R.M.A.; Dai, H.-L.; Ewees, A.A.; Shiri, J.; Kisi, O.; Zounemat-Kermani, M. Application of improved version of multi verse optimizer algorithm for modeling solar radiation. Energy Rep. 2022, 8, 12063–12080. [Google Scholar] [CrossRef]
  18. Adnan, R.M.; Kisi, O.; Mostafa, R.R.; Ahmed, A.N.; El-Shafie, A. The potential of a novel support vector machine trained with modified mayfly optimization algorithm for streamflow prediction. Hydrol. Sci. J. 2022, 67, 161–174. [Google Scholar] [CrossRef]
  19. Zhu, S.; Nyarko, E.K.; Hadzima-Nyarko, M. Modelling daily water temperature from air temperature for the Missouri River. Peerj 2018, 6, e4894. [Google Scholar] [CrossRef] [Green Version]
  20. Zhu, S.; Nyarko, E.K.; Hadzima-Nyarko, M.; Heddam, S.; Wu, S. Assessing the performance of a suite of machine learning models for daily river water temperature prediction. Peerj 2019, 7, e7065. [Google Scholar] [CrossRef]
  21. Zhu, S.; Heddam, S.; Nyarko, E.K.; Hadzima-Nyarko, M.; Piccolroaz, S.; Wu, S. Modeling daily water temperature for rivers: Comparison between adaptive neuro-fuzzy inference systems and artificial neural networks models. Environ. Sci. Pollut. Res. 2018, 26, 402–420. [Google Scholar] [CrossRef]
  22. Han, M.; Feng, Y.; Zhao, X.; Sun, C.; Hong, F.; Liu, C. A Convolutional Neural Network Using Surface Data to Predict Subsurface Temperatures in the Pacific Ocean. IEEE Access 2019, 7, 172816–172829. [Google Scholar] [CrossRef]
  23. Ikram, R.M.A.; Mostafa, R.R.; Chen, Z.; Islam, A.R.M.T.; Kisi, O.; Kuriqi, A.; Zounemat-Kermani, M. Advanced Hybrid Metaheuristic Machine Learning Models Application for Reference Crop Evapotranspiration Prediction. Agronomy 2023, 13, 98. [Google Scholar] [CrossRef]
  24. Zounemat-Kermani, M.; Keshtegar, B.; Kisi, O.; Scholz, M. Towards a comprehensive assessment of statis-tical versus soft computing models in hydrology: Application to monthly pan evaporation prediction. Water 2021, 13, 2451. [Google Scholar] [CrossRef]
  25. Mahdavi-Meymand, A.; Sulisz, W.; Zounemat-Kermani, M. A comprehensive study on the application of firefly algorithm in prediction of energy dissipation on block ramps. Eksploat. I Niezawodn. 2022, 24, 200–210. [Google Scholar] [CrossRef]
  26. Qiu, R.; Wang, Y.; Wang, D.; Qiu, W.; Wu, J.; Tao, Y. Water temperature forecasting based on modified artificial neural network methods: Two cases of the Yangtze River. Sci. Total. Environ. 2020, 737, 139729. [Google Scholar] [CrossRef]
  27. Stajkowski, S.; Kumar, D.; Samui, P.; Bonakdari, H.; Gharabaghi, B. Genetic-Algorithm-Optimized Sequential Model for Water Temperature Prediction. Sustainability 2020, 12, 5374. [Google Scholar] [CrossRef]
  28. Abualigah, L.; Abd Elaziz, M.; Sumari, P.; Geem, Z.W.; Gandomi, A.H. Reptile Search Algorithm (RSA): A nature-inspired meta-heuristic optimizer. Expert Syst. Appl. 2022, 191, 116158. [Google Scholar] [CrossRef]
  29. Almotairi, K.H.; Abualigah, L. Improved reptile search algorithm with novel mean transition mechanism for constrained industrial engineering problems. Neural Comput. Appl. 2022, 34, 17257–17277. [Google Scholar] [CrossRef]
  30. Al-Shourbaji, I.; Helian, N.; Sun, Y.; Alshathri, S.; Abd Elaziz, M. Boosting Ant Colony Optimization with Rep-tile Search Algorithm for Churn Prediction. Mathematics 2022, 10, 1031. [Google Scholar] [CrossRef]
  31. Khan, R.A.; Sabir, B.; Sarwar, A.; Liu, H.D.; Lin, C.H. Reptile Search Algorithm (RSA)-Based Selective Har-monic Elimination Technique in Packed E-Cell (PEC-9) Inverter. Processes 2022, 10, 1615. [Google Scholar] [CrossRef]
  32. Ahmadianfar, I.; Heidari, A.A.; Noshadian, S.; Chen, H.; Gandomi, A.H. INFO: An efficient optimization al-gorithm based on weighted mean of vectors. Expert Syst. Appl. 2022, 195, 116516. [Google Scholar] [CrossRef]
  33. Hassan, A.Y.; Ismaeel, A.A.K.; Said, M.; Ghoniem, R.M.; Deb, S.; Elsayed, A.G. Evaluation of Weighted Mean of Vectors Algorithm for Identification of Solar Cell Parameters. Processes 2022, 10, 1072. [Google Scholar] [CrossRef]
  34. Hubel, D.H.; Wiesel, T.N. Receptive fields of single neurons in cat’s striate cortex. J. Physiol. 1959, 148, 1959. [Google Scholar] [CrossRef] [PubMed]
  35. Hubel, D.H.; Wiesel, T.N. Receptive fields and functional architecture of monkey striate cortex. J. Physiol. 1968, 195, 215–243. [Google Scholar] [CrossRef]
  36. Fukushima, K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 1980, 36, 193–202. [Google Scholar] [CrossRef]
  37. Nair, V.; Hinton, G.E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on International Conference on Machine Learning (ICML’10), Haifa, Israel, 21–24 June 2010; Omnipress: Haifa, Israel, 2010; pp. 807–814. [Google Scholar]
  38. Graves, A.; Schmidhuber, J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 2005, 18, 602–610. [Google Scholar] [CrossRef]
  39. Sundermeyer, M.; Schlüter, R.; Ney, H. From feedforward to recurrent. LSTM neural networks for language modeling. IEEE/ACM Trans. Audio Speech Lang. Process. 2015, 23, 517–529. [Google Scholar] [CrossRef]
  40. Gensler, A.; Henze, J.; Sick, B.; Raabe, N. Deep Learning for solar power forecasting—An approach using AutoEncoder and LSTM Neural Networks. In Proceedings of the 2016 IEEE International Conference, Systems, Man, and Cybernetics (SMC), Budapest, Hungary, 9 October 2016. [Google Scholar] [CrossRef]
  41. Nelson, D.M.Q.; Pereira, A.C.M.; de Oliveira, R.A. Stock market’s price movement prediction with LSTM neural networks. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 June 2017; pp. 1419–1426. [Google Scholar] [CrossRef]
  42. Adnan, R.M.; Liang, Z.; Heddam, S.; Zounemat-Kermani, M.; Kisi, O.; Li, B. Least square support vector machine and multivariate adaptive regression splines for streamflow prediction in mountainous basin using hydro-meteorological data as inputs. J. Hydrol. 2020, 586, 124371. [Google Scholar] [CrossRef]
  43. Elgamal, Z.; Sabri, A.Q.M.; Tubishat, M.; Tbaishat, D.; Makhadmeh, S.N.; Alomari, O.A. Improved Reptile Search Optimization Algorithm Using Chaotic Map and Simulated Annealing for Feature Selection in Medical Field. IEEE Access 2022, 10, 51428–51446. [Google Scholar] [CrossRef]
  44. Al-Shourbaji, I.; Kachare, P.H.; Alshathri, S.; Duraibi, S.; Elnaim, B.; Elaziz, M.A. An Efficient Parallel Reptile Search Algorithm and Snake Optimizer Approach for Feature Selection. Mathematics 2022, 10, 2351. [Google Scholar] [CrossRef]
  45. Shi, J.; Guo, J.; Zheng, S. Evaluation of hybrid forecasting approaches for wind speed and power generation time series. Renew. Sustain. Energy Rev. 2012, 16, 3471–3480. [Google Scholar] [CrossRef]
  46. Zhang, D.; Peng, X.; Pan, K.; Liu, Y. A novel wind speed forecasting based on hybrid decomposition and online sequential outlier robust extreme learning machine. Energy Convers. Manag. 2018, 180, 338–357. [Google Scholar] [CrossRef]
  47. Webb, B.W.; Clack, P.D.; Walling, D.E. Water-air temperature relationships in a Devon river system and the role of flow. Hydrol. Process. 2003, 17, 3069–3084. [Google Scholar] [CrossRef]
  48. Ahmadi-Nedushan, B.; St-Hilaire, A.; Ouarda, T.B.M.J.; Bilodeau, L.; Robichaud, E.; Thiemonge, N.; Bobee, B. Predicting river water temperatures using stochastic models: Case study of the Moisie River (Qu’ebec, Cana-da). Hydrol. Process. 2007, 21, 21–34. [Google Scholar] [CrossRef]
  49. Sohrabi, M.M.; Benjankar, R.; Tonina, D.; Wenger, S.J.; Isaak, D.J. Estimation of daily stream water temperatures with a Bayesian regression approach. Hydrol. Process. 2017, 31, 1719–1733. [Google Scholar] [CrossRef]
Figure 1. Case study area.
Figure 1. Case study area.
Jmse 11 00259 g001
Figure 2. Conceptual model of CNN.
Figure 2. Conceptual model of CNN.
Jmse 11 00259 g002
Figure 3. LSTM model’s flow chart.
Figure 3. LSTM model’s flow chart.
Jmse 11 00259 g003
Figure 4. RSA algorithm’s flow chart.
Figure 4. RSA algorithm’s flow chart.
Jmse 11 00259 g004
Figure 5. Improved CNN model with metaheuristic algorithms (MHs).
Figure 5. Improved CNN model with metaheuristic algorithms (MHs).
Jmse 11 00259 g005
Figure 6. Improved LSTM model.
Figure 6. Improved LSTM model.
Jmse 11 00259 g006
Figure 7. Scatterplots of the observed and predicted water temperature using different CNN- and LSTM-based models in the test period with the best input combination.
Figure 7. Scatterplots of the observed and predicted water temperature using different CNN- and LSTM-based models in the test period with the best input combination.
Jmse 11 00259 g007aJmse 11 00259 g007b
Figure 8. Taylor diagrams of the predicted water temperature using different LSTM-based models in the test period with the best input combination.
Figure 8. Taylor diagrams of the predicted water temperature using different LSTM-based models in the test period with the best input combination.
Jmse 11 00259 g008
Figure 9. Taylor diagrams of the predicted water temperature using different CNN-based models in the test period with the best input combination.
Figure 9. Taylor diagrams of the predicted water temperature using different CNN-based models in the test period with the best input combination.
Jmse 11 00259 g009
Figure 10. Violin charts of the predicted water temperature using different CNN-based models in the test period with the best input combination.
Figure 10. Violin charts of the predicted water temperature using different CNN-based models in the test period with the best input combination.
Jmse 11 00259 g010
Figure 11. Violin charts of the predicted water temperature using different LSTM-based models in the test period with the best input combination.
Figure 11. Violin charts of the predicted water temperature using different LSTM-based models in the test period with the best input combination.
Jmse 11 00259 g011
Table 1. The statistical overview of dataset.
Table 1. The statistical overview of dataset.
MeanMin.MaxSkewnessStd. Dev.
Air Temperature
Sediment Flow
Water Temperature
Table 2. List of hyperparameters and their values.
Table 2. List of hyperparameters and their values.
Number of filters[1–300]
Kernel size[1–20]
Number of epochs[1–200]
Batch size[10–100]
Pooling size[1–15]
Table 3. Parameter settings for all algorithms.
Table 3. Parameter settings for all algorithms.
RSA α 0.1
β 0.1
INFO c 2
d 4
Common SettingsPopulation30
Number of iterations100
Number of runs for each algorithm30
Table 4. Training and test statistics of the models for water temperature prediction—CNN.
Table 4. Training and test statistics of the models for water temperature prediction—CNN.
Inputs CombinationsTraining PeriodTest Period
(i) Ta1.5541.2110.9140.9141.6521.2890.9180.894
(ii) Ta, P1.4531.1200.9250.9241.6891.3020.9170.889
(iii) Ta, Q1.5381.1940.9150.9151.6541.2940.9180.894
(iv) Ta, S1.4941.1540.9200.9201.6451.2940.9210.895
(v) Ta, P, Q1.4971.1620.9200.9201.6361.2810.9200.896
(vi) Ta, Q, S1.4771.1400.9220.9221.6751.3100.9170.891
(vii) Ta, P, Q, S1.4891.1490.9210.9211.6441.2870.9220.895
(viii) Ta, DOY1.4141.0890.9280.9281.6231.2560.9210.898
(ix) Ta, P, Q, S, DOY1.3761.0670.9340.9341.5671.2230.9280.905
(i) Ta1.4341.1190.9240.9241.5591.2480.9270.926
(ii) Ta, P1.3841.0750.9290.9291.5121.1910.9370.936
(iii) Ta, Q1.4441.1190.9230.9231.5281.2210.9290.928
(iv) Ta, S1.4641.1400.9210.9201.5521.2370.9290.925
(v) Ta, P, Q1.3791.0680.9290.9291.4911.1990.9320.929
(vi) Ta, Q, S1.4341.1120.9240.9241.5851.2400.9310.928
(vii) Ta, P, Q, S1.4681.0610.9200.9201.4801.1780.9290.926
(viii) Ta, DOY1.3601.0460.9300.9301.4411.1650.9370.933
(ix) Ta, P, Q, S, DOY1.4171.1000.9260.9251.5141.1790.9250.924
(i) Ta1.4331.1130.9300.9291.4461.1360.9330.929
(ii) Ta, P1.4791.1380.9250.9251.3451.0670.9390.937
(iii) Ta, Q1.3931.0760.9330.9331.4261.1190.9360.932
(iv) Ta, S1.4071.0910.9320.9321.4491.1220.9340.930
(v) Ta, P, Q1.3741.0630.9350.9351.4571.1450.9380.924
(vi) Ta, Q, S1.4031.0830.9320.9321.4231.1090.9310.916
(vii) Ta, P, Q, S1.3341.0310.9370.9371.4201.1050.9340.921
(viii) Ta, DOY1.4201.1120.9310.9311.4091.1130.9360.926
(ix) Ta, P, Q, S, DOY1.3311.0200.9390.9391.3951.0960.9420.935
Table 5. Training and test statistics of the models for water temperature prediction—CNN-RSA.
Table 5. Training and test statistics of the models for water temperature prediction—CNN-RSA.
Inputs CombinationsTraining PeriodTest Period
(i) Ta1.4701.1490.9230.9231.6111.2520.9190.899
(ii) Ta, P1.4211.0910.9280.9281.5741.2170.9240.904
(iii) Ta, Q1.4511.1120.9250.9251.5941.2290.9200.902
(iv) Ta, S1.4771.1430.9220.9221.6201.2600.9190.898
(v) Ta, P, Q1.3991.0770.9300.9301.6001.2580.9250.901
(vi) Ta, Q, S1.4501.1330.9260.9251.6491.2920.9200.895
(vii) Ta, P, Q, S1.4011.0800.9300.9301.5841.2540.9290.903
(viii) Ta, DOY1.3881.0800.9330.9311.5571.2120.9280.906
(ix) Ta, P, Q, S, DOY1.3361.0340.9360.9331.5071.540.9300.912
(i) Ta1.4201.1040.9250.9251.5351.2270.9310.921
(ii) Ta, P1.3581.0530.9320.9321.4781.1730.9410.927
(iii) Ta, Q1.4191.0980.9250.9251.5261.2160.9310.922
(iv) Ta, S1.4361.1140.9240.9231.5481.2320.9340.920
(v) Ta, P, Q1.3761.0530.9300.9301.4711.1630.9380.928
(vi) Ta, Q, S1.4301.1100.9240.9241.5401.2540.9330.921
(vii) Ta, P, Q, S1.4131.0960.9260.9261.4741.1460.9370.928
(viii) Ta, DOY1.3521.0470.9320.9321.4521.1730.9370.930
(ix) Ta, P, Q, S, DOY1.3301.0190.9380.9361.4451.1460.9430.933
(i) Ta1.4131.1000.9310.9311.4161.1180.9330.929
(ii) Ta, P1.3541.0560.9370.9371.3161.0430.9400.938
(iii) Ta, Q1.3701.0640.9360.9351.4191.1160.9340.928
(iv) Ta, S1.3891.0770.9340.9341.3991.0930.9330.930
(v) Ta, P, Q1.3171.0090.9410.9401.3301.0320.9410.937
(vi) Ta, Q, S1.3791.0720.9350.9351.3971.0980.9360.930
(vii) Ta, P, Q, S1.3241.0200.9400.9401.3651.0690.9400.934
(viii) Ta, DOY1.2791.0020.9450.9441.2660.9780.9470.943
(ix) Ta, P, Q, S, DOY1.3161.0190.9400.9401.3511.0660.9430.935
Table 6. Training and test statistics of the models for water temperature prediction—CNN-INFO.
Table 6. Training and test statistics of the models for water temperature prediction—CNN-INFO.
Inputs CombinationsTraining PeriodTest Period
(i) Ta1.4501.1310.9250.9251.6061.2430.9220.900
(ii) Ta, P1.4131.0890.9290.9291.5731.2180.9250.904
(iii) Ta, Q1.3481.0430.9350.9351.5461.1820.9280.907
(iv) Ta, S1.4421.1240.9260.9261.6201.2760.9210.898
(v) Ta, P, Q1.3301.0200.9370.9371.5051.1860.9330.912
(vi) Ta, Q, S1.3581.0540.9350.9341.5611.2220.9270.905
(vii) Ta, P, Q, S1.3331.0350.9370.9361.5071.1760.9330.912
(viii) Ta, DOY1.3181.0130.9400.9381.3601.0810.9450.928
(ix) Ta, P, Q, S, DOY1.3691.0510.9330.9331.5381.1880.9280.908
(i) Ta1.4251.1080.9250.9251.5151.2250.9310.923
(ii) Ta, P1.3341.0300.9340.9341.4511.1480.9420.930
(iii) Ta, Q1.3881.0730.9300.9281.4351.1510.9310.931
(iv) Ta, S1.4401.1190.9230.9231.5311.2130.9330.922
(v) Ta, P, Q1.2881.0000.9390.9381.4211.1130.9380.933
(vi) Ta, Q, S1.3261.0440.9360.9351.5251.2140.9340.922
(vii) Ta, P, Q, S1.3581.0470.9320.9321.4671.1570.9370.928
(viii) Ta, DOY1.1320.8740.9530.9521.3831.0990.9460.936
(ix) Ta, P, Q, S, DOY1.3031.0160.9370.9371.4031.1090.9380.934
(i) Ta1.4041.0900.9320.9321.4141.1120.9360.929
(ii) Ta, P1.3371.0360.9390.9391.2750.9930.9430.942
(iii) Ta, Q1.3591.0570.9370.9371.3951.0780.9410.931
(iv) Ta, S1.3571.0520.9370.9371.4011.1000.9340.930
(v) Ta, P, Q1.2560.9620.9460.9461.3351.0510.9440.937
(vi) Ta, Q, S1.3781.0660.9350.9351.3821.0860.9350.932
(vii) Ta, P, Q, S1.2900.9900.9430.9431.3641.0690.9410.934
(viii) Ta, DOY1.2190.9470.9500.9491.2410.9620.9490.945
(ix) Ta, P, Q, S, DOY1.2760.9960.9440.9441.3511.0670.9440.935
Table 7. Training and test statistics of the models for water temperature prediction—LSTM.
Table 7. Training and test statistics of the models for water temperature prediction—LSTM.
Inputs CombinationsTraining PeriodTest Period
(i) Ta1.5291.1860.9160.9161.6311.2590.9190.897
(ii) Ta, P1.4381.1080.9260.9261.6121.2510.9250.899
(iii) Ta, Q1.4381.1180.9260.9261.5801.2330.9260.903
(iv) Ta, S1.5001.1600.9190.9191.6391.2810.9190.896
(v) Ta, P, Q1.4001.0790.9300.9301.5921.2430.9250.902
(vi) Ta, Q, S1.4521.1180.9250.9251.6541.2950.9180.894
(vii) Ta, P, Q, S1.4141.0890.9280.9281.5951.2510.9250.901
(viii) Ta, DOY1.4991.1620.9200.9201.6141.2610.9340.899
(ix) Ta, P, Q, S, DOY1.3861.0680.9320.9321.5661.2210.9290.905
(i) Ta1.4281.1110.9240.9241.5351.2420.9300.921
(ii) Ta, P1.3841.0690.9290.9291.4661.1510.9370.928
(iii) Ta, Q1.4441.1190.9230.9231.5251.2160.9320.922
(iv) Ta, S1.4291.1150.9240.9241.5871.2720.9330.916
(v) Ta, P, Q1.3831.0710.9290.9291.5371.2180.9350.921
(vi) Ta, Q, S1.4341.1130.9240.9241.5721.2710.9320.918
(vii) Ta, P, Q, S1.3791.0700.9290.9291.5231.2020.9360.923
(viii) Ta, DOY1.4401.1180.9230.9231.5581.2580.9320.919
(ix) Ta, P, Q, S, DOY1.3581.0520.9320.9321.4561.1680.9370.929
(i) Ta1.4171.0920.9310.9311.4241.1170.9340.928
(ii) Ta, P1.3611.0500.9360.9361.3411.0500.9390.936
(iii) Ta, Q1.3801.0670.9350.9351.4221.1070.9350.928
(iv) Ta, S1.4021.0880.9320.9321.4281.1100.9280.927
(v) Ta, P, Q1.3281.0560.9360.9351.3631.0880.9340.934
(vi) Ta, Q, S1.3831.0680.9340.9341.4111.0910.9400.929
(vii) Ta, P, Q, S1.3281.0240.9390.9391.3651.0680.9350.934
(viii) Ta, DOY1.3931.0810.9330.9331.4051.1100.9310.930
(ix) Ta, P, Q, S, DOY1.3191.0210.9400.9401.3541.0550.9430.935
Table 8. Training and test statistics of the models for water temperature prediction—LSTM-RSA.
Table 8. Training and test statistics of the models for water temperature prediction—LSTM-RSA.
Inputs CombinationsTraining PeriodTest Period
(i) Ta1.4861.1550.9210.9211.6081.2520.9230.900
(ii) Ta, P1.3991.0840.9310.9301.5551.1970.9270.906
(iii) Ta, Q1.4001.0890.9300.9301.5491.1650.9210.907
(iv) Ta, S1.3681.0780.9330.9331.6231.2860.9250.898
(v) Ta, P, Q1.3321.0350.9400.9361.5561.2200.9300.906
(vi) Ta, Q, S1.4461.1160.9250.9251.6451.2890.9200.895
(vii) Ta, P, Q, S1.4001.0820.9300.9301.5861.2420.9250.902
(viii) Ta, DOY1.1800.9140.9500.9501.5201.1750.9390.910
(ix) Ta, P, Q, S, DOY1.3511.0480.9320.9321.5531.2050.9260.907
(i) Ta1.4181.1010.9250.9251.5301.2280.9290.922
(ii) Ta, P1.3721.0620.9300.9301.4191.1300.9410.933
(iii) Ta, Q1.3141.0150.9360.9361.4371.1470.9330.931
(iv) Ta, S1.3841.0770.9290.9291.5011.2140.9360.925
(v) Ta, P, Q1.3271.0280.9350.9351.4261.1220.9390.932
(vi) Ta, Q, S1.3781.0580.9300.9301.5121.2020.9320.924
(vii) Ta, P, Q, S1.3781.0730.9300.9301.4631.1520.9400.929
(viii) Ta, DOY1.0270.7990.9610.9611.1760.9210.9450.939
(ix) Ta, P, Q, S, DOY1.3421.0370.9360.9361.4491.1580.9380.930
(i) Ta1.3931.0750.9330.9331.4211.1130.9350.928
(ii) Ta, P1.3051.0060.9420.9411.3171.0400.9440.938
(iii) Ta, Q1.3541.0560.9370.9371.3921.0820.9420.931
(iv) Ta, S1.3941.0890.9340.9331.3751.0640.9420.933
(v) Ta, P, Q1.3141.0130.9410.9411.3281.0150.9440.937
(vi) Ta, Q, S1.3431.0490.9380.9381.4041.0960.9350.930
(vii) Ta, P, Q, S1.3251.0210.9400.9401.3141.0310.9410.939
(viii) Ta, DOY1.1380.8610.9560.9551.1120.8660.9490.942
(ix) Ta, P, Q, S, DOY1.1910.9210.9540.9511.3541.0670.9440.935
Table 9. Training and test statistics of the models for water temperature prediction—LSTM-INFO.
Table 9. Training and test statistics of the models for water temperature prediction—LSTM-INFO.
Model InputsTraining PeriodTest Period
(i) Ta1.4051.0870.9270.9271.6061.2450.9200.900
(ii) Ta, P1.3111.0000.9360.9361.5611.2290.9290.905
(iii) Ta, Q1.2700.9630.9400.9401.4451.1050.9310.919
(iv) Ta, S1.3711.0760.9300.9301.5741.2460.9270.904
(v) Ta, P, Q1.2480.9670.9420.9421.4571.1420.9360.918
(vi) Ta, Q, S1.2871.0200.9390.9391.5121.1830.9320.911
(vii) Ta, P, Q, S1.3381.0590.9340.9341.4761.1450.9360.916
(viii) Ta, DOY1.1610.8890.9500.9501.2561.0100.9550.939
(ix) Ta, P, Q, S, DOY1.2630.9820.9420.9411.4161.1080.9410.922
(i) Ta1.4601.1390.9240.9241.5111.2130.9300.924
(ii) Ta, P1.3561.0420.9340.9341.4201.1240.9420.933
(iii) Ta, Q1.2850.9760.9410.9411.4841.2080.9310.927
(iv) Ta, S1.4531.1370.9250.9241.4871.1980.9350.926
(v) Ta, P, Q1.2610.9710.9430.9431.4021.1160.9400.935
(vi) Ta, Q, S1.3011.0180.9390.9391.5241.2130.9350.923
(vii) Ta, P, Q, S1.2560.9820.9440.9441.4591.1490.9380.929
(viii) Ta, DOY1.0760.7980.9590.9591.2901.0080.9600.945
(ix) Ta, P, Q, S, DOY1.0520.8230.9600.9601.1980.9460.9500.952
(i) Ta1.3781.0600.9350.9351.4121.1040.9350.929
(ii) Ta, P1.2540.9630.9460.9461.2921.0060.9450.941
(iii) Ta, Q1.2790.9780.9440.9441.3941.0870.9420.931
(iv) Ta, S1.3301.0340.9390.9391.3901.0780.9420.931
(v) Ta, P, Q1.2450.9640.9470.9471.3261.0460.9490.937
(vi) Ta, Q, S1.2951.0040.9420.9421.3761.1040.9350.933
(vii) Ta, P, Q, S1.2580.9710.9460.9461.3391.0580.9420.936
(viii) Ta, DOY0.9570.7210.9680.9681.0790.8390.9560.959
(ix) Ta, P, Q, S, DOY1.0670.8160.9610.9611.1870.8800.9650.950
Table 10. Training and test statistics of the best input combinations of all applied models for water temperature prediction.
Table 10. Training and test statistics of the best input combinations of all applied models for water temperature prediction.
ModelTraining PeriodTest Period
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ikram, R.M.A.; Mostafa, R.R.; Chen, Z.; Parmar, K.S.; Kisi, O.; Zounemat-Kermani, M. Water Temperature Prediction Using Improved Deep Learning Methods through Reptile Search Algorithm and Weighted Mean of Vectors Optimizer. J. Mar. Sci. Eng. 2023, 11, 259.

AMA Style

Ikram RMA, Mostafa RR, Chen Z, Parmar KS, Kisi O, Zounemat-Kermani M. Water Temperature Prediction Using Improved Deep Learning Methods through Reptile Search Algorithm and Weighted Mean of Vectors Optimizer. Journal of Marine Science and Engineering. 2023; 11(2):259.

Chicago/Turabian Style

Ikram, Rana Muhammad Adnan, Reham R. Mostafa, Zhihuan Chen, Kulwinder Singh Parmar, Ozgur Kisi, and Mohammad Zounemat-Kermani. 2023. "Water Temperature Prediction Using Improved Deep Learning Methods through Reptile Search Algorithm and Weighted Mean of Vectors Optimizer" Journal of Marine Science and Engineering 11, no. 2: 259.

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop