A Spatial-Reduction Attention-Based BiGRU Network for Water Level Prediction

Bao, Kexin; Bi, Jinqiang; Ma, Ruixin; Sun, Yue; Zhang, Wenjia; Wang, Yongchao

doi:10.3390/w15071306

Open AccessArticle

A Spatial-Reduction Attention-Based BiGRU Network for Water Level Prediction

by

Kexin Bao

^1,2,*

,

Jinqiang Bi

^1,2

,

Ruixin Ma

^1,2,3,

Yue Sun

⁴,

Wenjia Zhang

^1,2

and

Yongchao Wang

^1,2

¹

Tianjin Research Institute for Water Transport Engineering, Ministry of Transport of the People’s Republic of China (M.O.T.), Tianjin 300456, China

²

National Engineering Research Center of Port Hydraulic Construction Technology, Tianjin 300456, China

³

Key Laboratory of Marine Simulation and Control, Dalian Maritime University, Dalian 116026, China

⁴

Shenzhen Ansoft Huishi Technology Co., Ltd., Tianjin Branch, Tianjin 300210, China

^*

Author to whom correspondence should be addressed.

Water 2023, 15(7), 1306; https://doi.org/10.3390/w15071306

Submission received: 27 February 2023 / Revised: 18 March 2023 / Accepted: 21 March 2023 / Published: 26 March 2023

(This article belongs to the Special Issue Statistical Analysis in Hydrology: Methods and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

According to the statistics of ship traffic accidents on inland waterways, potential safety hazards such as stranding, hitting rocks, and suspending navigation are on the increase because of the sudden rise and fall of the water level, which may result in fatalities, environmental devastation, and massive economic losses. In view of this situation, the purpose of this paper is to propose a high-accuracy water-level-prediction model based on the combination of the spatial-reduction attention and bidirectional gate recurrent unit (SRA-BiGRU), which provides support for ensuring the safe navigation of ships, guiding the reasonable stowage of ships, and flood prevention. The first contribution of this model is that it makes use of its strong fitting ability to capture nonlinear characteristics, and it fully considers the time series of water-level data. Secondly, the bidirectional recurrent neural network structure makes full use of past and future water-level information in the mapping process between input and output sequences. Thirdly, and most importantly, the introduction of spatial-reduction attention on the basis of BiGRU can not only automatically capture the correlations between the hidden vectors generated by BiGRU to address the issue of precision degradation due to the extended time span in water-level-forecasting tasks but can also make full use of the spatial information between water-level stations by emphasizing the influence of significant features on the prediction results. It is noteworthy that comparative experiments gradually prove the superiority of GRU, bidirectional recurrent neural network structure, and spatial-reduction attention, demonstrating that SRA-BiGRU is a water-level-prediction model with high availability, high accuracy, and high robustness.

Keywords:

water-level prediction; SRA-BiGRU; spatial-reduction attention; bidirectional RNN structure; GRU

1. Introduction

The waterway water level is the primary factor to be considered for safe navigation of ships, and it is also one of the main factors affecting navigation capacity [1,2]. Especially for inland waterways of mountainous areas, problems such as sudden rise of water level, surge of velocity, and complex flow regime in downstream waterways occur frequently due to the irregular discharge of unsteady flow from upstream power stations and rainstorm confluence in river basins [3,4]. More specifically, when the river water level soars for a short time, the ships operating in ports and berthing in anchorages along the river need to be adjusted frequently according to the changes of water level. However, when the river water level drops rapidly, the navigation conditions deteriorate rapidly, which brings potential safety hazards such as stranding, hitting rocks, and even being forced to suspend navigation [5,6]. Therefore, water-level data are an important reference for evaluating navigation capacity, and accurate prediction of water level is of great guiding significance for flood control, reasonable stowage of ships, and especially for ensuring the safe navigation of ships [7]. At present, by setting up hydrological observation stations and a hydrological measurement control system based on the Internet of Things, sailing ships can observe dynamic changes in water level in real time, which helps accumulate a large amount of historical hydrological data [8]. How to further research hydrological prediction based on these data has attracted extensive attention from scholars. In recent years, the relevant research on hydrological prediction has been mainly divided into two categories including the method based on physical processes and the method based on data characteristics.

The method based on physical processes, as its name implies, involves building a model based on the actual process. There are numerous physical-process-based models, which can be classified into hydraulic models, hydrological models, and hydrological hydraulic models [9,10]. Typical application cases of these models include the use of the Danish Mike 21 hydrodynamic model, the American SMS hydraulic model, and the American SSARR hydrological model to explore the change of flood-water level [11,12,13]. However, modeling based on physical processes typically requires the collection of numerous physical parameters from topography, land-use, and meteorological data. In practical applications, however, these hydrological and meteorological data records and geographic data records are frequently unavailable or difficult to obtain, limiting the application of such models [14].

The method based on data feature is suitable for hydrological prediction from the data point of view by mathematical model or machine learning model to mine the relationship between data, including time-series analysis, wavelet analysis, support vector machine, and deep learning. Among them, autoregressive model (AR), autoregressive moving average model (ARMA), and autoregressive integrated moving average model (ARIMA) are often used to analyze water-level changes and runoff changes [14]. However, this method only reveals the dependency of the series in the time domain, so the processing effect of nonlinear series is poor. Wavelet analysis performs well in the analysis of variation characteristics, so Adamowski et al., Seo et al., and Wang et al. used wavelet analysis method in combination with deep learning in water-level prediction [15,16,17]. Nevertheless the selection of wavelet basis function in wavelet analysis is a thorny problem, and the prediction of water level always needs to be combined with other methods to achieve the target effect. Support vector machine (SVM) was first proposed by Cortes and Vapnik, which performs well in practical classification and regression problems [18]. Behzad et al. compared the performance of SVM and artificial neural networks in water-level prediction of aquifers [19]. Dai et al. used a support vector machine to calculate the ideal ecological water level of Dongting Lake [20]. However, when solving regression problems, the selection of input features of support vector machines is more complex and time-consuming, and the effect will worsen as the amount of data increases. We therefore select the features of input data to avoid the interference caused by redundant information.

It is worth mentioning that in recent years, deep learning has made important progress in data-feature-based methods, which solves the problems that nonlinear data and large quantities of data are difficult to process in the above research methods. Because water-level data are typical nonlinear and non-stationary time-series data, the emergence of recurrent neural network (RNN) greatly promotes the development of hydrological time-series forecasting [21,22]. However, RNN is faced with gradient disappearance and gradient explosion, which makes it unable to cope with long-term dependence [23,24]. To solve these gradient problems of RNN, long short-term memory (LSTM) and the gated recurrent unit (GRU), which are excellent variation on RNN, have been proposed successively [25,26,27]. Thus, due to their effectiveness in time-series prediction, LSTM and GRU have been applied to the field of hydrologic forecasting in recent years. Typically, Le et al. [28] proposed an LSTM model suitable for flood prediction and conducted performance comparison experiments with daily runoff and rainfall data as input datasets, demonstrating the excellent performance of LSTM in flood prediction. Xu et al. [29] proposed a water-level time-series prediction model based on GRU and LightGBM and used GRU to extract water-level data in order to develop a fundamental model for water-level data prediction. The prediction results were divided into non-flood season and flood season and then combined with environmental factors after LightGBM feature selection to establish the final model, thereby resolving the issue of accuracy loss caused by the simple superposition of the predicted values in different seasons.

A point worth emphasizing is that in terms of time-series prediction, compared with using RNN and its variants alone, some research work has combined the RNN model prediction output with the attention mechanism in deep learning. These results show that the introduction of attention mechanism can improve the accuracy of timing prediction to a large extent [30,31,32]. The attention mechanism is a method for rapidly selecting high-value information from huge amounts of information. Based on this mechanism, Hu et al. [33] developed a runoff forecasting model and predicted the runoff of the basin, and the results demonstrated that this improved method can enhance the accuracy of runoff forecasting.

However, the method combining RNN and the attention mechanism is rarely applied in the field of water-level prediction. As far as the water-level prediction, the water level is not only affected by the historical water level in the past but also by the upstream water level; that is, the river water level data are spatially and temporally related. The existing water-level-prediction models based on LSTM and GRU are facing the problem of insufficient spatial information response capacity [34]. Moreover, the introduction of an attention mechanism based on RNN can partially solve the problem of insufficient spatial information utilization.

Thus, this paper proposes a high-accuracy water-level-prediction model based on a combination of the spatial-reduction attention and bidirectional gate recurrent unit (SRA-BiGRU) to fully utilize the valuable information contained in massive amount of hydrological data of the Wujiang river and address the deficiencies of existing water-level-prediction methods. The contribution of this model is briefly summarized below: firstly, BiGRU makes use of its strong fitting ability in capturing nonlinear characteristics and fully considers the time series of water-level data. In addition, the bidirectional GRU structure enables the modeling of the potential relationship between past and future water-level data and current data, thereby improving the accuracy of forecasts. Secondly, the introduction of spatial-reduction attention based on BiGRU can actively learn the correlation of hidden vectors of BiGRU and highlight the influence of important features on the prediction results, thereby solving the problems of insufficient utilization of spatial information and long time span in water-level-prediction tasks, which lead to the decline of prediction accuracy. Particularly, the choice of spatial-reduction attention inspired by the Pyramid vision transformer model in the field of computer vision can reduce the computational and memory overhead of the multiheaded attention mechanism due to its unique structure. Last but not least, the superiority of GRU, bidirectional RNN structure and spatial-reduction attention is gradually verified in this research by eight groups of comparative experiments based on the real-time data collected from five water-level-measurement stations in the Wujiang River, which fully proves that the SRA-BiGRU model has higher prediction accuracy in the water-level-prediction task.

2. Materials and Methods

Figure 1 vividly represents the framework of this paper’s core content, which mainly consists SRA-BiGRU model proposal and model experiments. Specifically, an easy-to-use model appropriate for precise and quick water-level prediction is proposed by gradually introducing the applications of RNN variants, bidirectional RNN structure, and spatial-reduction attention. On this basis, the model experiments discuss in detail the methods of data processing and parameter optimization, which play a crucial role in enhancing the model’s accuracy, and then present the effectiveness of the proposed model through comparative experiments.

2.1. Study Area Description

Wujiang River, originating in Guizhou Province and flowing into the Yangtze River in Chongqing City, is the largest tributary on the right bank of the upper reaches of the Yangtze River (see Figure 2) [35]. Heavy rainfall is the main cause of the Wujiang flood, but irregular water release from the upstream power station is also a significant contributing factor. Specifically, the Wujiang mainstream power station mainly employs multiple daily peak-shaving operations. The discharge flow of each hydropower station is non-constant and causes the water level of the channel under the dam to rise and fall sharply, with large instantaneous amplitude and frequent changes. In extreme situations, the water level can rise by up to 20 m per day [4,35]. When the discharge flow of the power station increases, the water level of the river under the dam rises sharply for a short time. This necessitates frequent adjustments by port operation ships and anchorage ships along the line in response to the change in water level, which poses potential safety hazards to navigable ships and berthing ships in the river section. When the discharge flow of the power station decreases, the water level of the river decreases rapidly, and the navigation conditions deteriorate rapidly. This poses potential safety hazards to transport vessels, such as running aground or being forced to stop sailing [5,6].

Therefore, conducting accurate water-level-prediction research on Wujiang River is of great significance for flood control, reasonable stowage, and safe navigation of ships in inland waters. Based on this, this experiment used real-time hydrological data collected from five water-level-measurement stations (WL station) built in the 79-kilometer section from Yinpan hydropower station to the Yangtze River Estuary of Wujiang River from 2018 to 2011 as the research object (see Figure 3).

2.2. Comparison of RNN Variations

RNN has greatly contributed to the development of hydrological time-series forecasting; however, the gradient-vanishing and gradient-explosion problems of RNN lead to their inability to be applied to long-time dependent situations [23,24]. LSTM is one of the typical variants of RNN, which uses gate mechanism and memory unit and can solve the above gradient problem well. The gate structure in LSTM generally includes a forget gate, input gate, and output gate [25,26]. Additionally, GRU is a variant of LSTM, which synthesizes the forget gate and input gate of LSTM into one update gate and also mixes cell states and hidden states [27]. Thus, fewer parameters make GRU converge faster while guaranteeing the prediction accuracy. The simplified structure of GRU is shown in Figure 4, with only reset gate (

r_{t}

) and update gate (

z_{t}

), and the calculation procedure at each sequence index position is shown as follows:

The

r_{t}

controls the previous moment information with a certain probability and facilitates the obtaining of short-term dependencies in the hydrological time-series data:

r_{t} = σ (W_{r} \cdot [h_{t - 1}, x_{t}] + b_{r})

(1)

Information about the state at the moment before

z_{t}

controls is substituted into the current state, helping to obtain long-term dependencies in the hydrological time-series data:

z_{t} = σ (W_{z} \cdot [h_{t - 1}, x_{t}] + b_{z})

(2)

The implied state

h_{t}

is updated using the update gate

z_{t}

for

h_{t - 1}

and

{\tilde{h}}_{t}

. The update gate determines the importance of the past

h_{t}

at the current moment as a way to solve the gradient decay problem of the RNN and to better obtain the dependencies with larger intervals in the hydrological time-series data:

{\tilde{h}}_{t} = \tanh (W_{h} \cdot [r_{t} Θ h_{t - 1}, x_{t}] + b_{h})

(3)

h_{t} = (1 - z_{t}) Θ h_{t - 1} + z_{t} Θ {\tilde{h}}_{t}

(4)

where [·] means the multiplication of two vectors,

\cdot

presents matrix multiplication,

Θ

demonstrates that each matrix member is multiplied correspondingly, and W and b represent the weight and bias items of the relevant gates and bias items, respectively.

σ

is the sigmoid activation function.

Thus, as depicted in Figure 4 [30], GRU and LSTM are the most representative recurrent neural networks, and both are capable of resolving problems involving long-term memory and back-propagation gradient dispersion. The GRU model and the LSTM model have indistinguishable effects on a variety of task; therefore, this paper conducts comparative experiments based on LSTM and GRU. These comparative experiments not only compare the performance of LSTM and GRU to some extent but also fully verify the effectiveness of using bidirectional RNN structure and introducing spatial-reduction attention.

2.3. Bidirectional RNN Structure

The recurrent neural network can only be extracted from previous inputs in order to predict the current state, but bidirectional recurrent neural network (BiRNN) will extract future data to enhance its accuracy [36,37]. BiRNN is also applicable to LSTM and GRU. Therefore, bidirectional GRU(BiGRU) will serve as an illustration for this discussion.

As shown in Figure 5 [30], {

\vec{h_{t - 1}}

,

\vec{h_{t}}

,

\vec{h_{t + 1}}

} depicts the forward propagation path of BIGRU, whereas {

\overset{\leftarrow}{h_{t - 1}}

,

\overset{\leftarrow}{h_{t}}

,

\overset{\leftarrow}{h_{t + 1}}

} depicts the reverse propagation path. To be more precise, the forward propagation learns from previous data, whereas the reverse propagation learns from future data, allowing each time step to make optimal use of upper and lower related data. Then, these two outputs are combined to form the final output of the entire BiGRU [36,37]. Thus, BiGRU permits the modeling of the potential relationship between previous and future water-level information with the current information, hence enhancing the accuracy of forecasts.

\vec{h_{t}} = GRU (x_{t}, \vec{h_{t - 1}})

(5)

\overset{\leftarrow}{h_{t}} = GRU (x_{t}, \overset{\leftarrow}{h_{t - 1}})

(6)

h_{t} = w_{t} \vec{h_{t}} + v_{t} \overset{\leftarrow}{h_{t}} + b_{t}

(7)

2.4. Spatial-Reduction Attention

Since the rapid development of deep learning, attention mechanism has been widely used in natural language processing, statistical learning, image detection, speech recognition, and other fields as well as in the processing of regression problems [38]. For the time-series prediction, some research combines the RNN model prediction output with the attention mechanism, and its research results indicate that the addition of the attention mechanism can significantly improve prediction accuracy [30,31,32]. These researchers typically use two dimensions to explain the improvement in prediction accuracy: attention mechanisms based on different times and attention mechanisms based on different characteristics. The former assigns different weights to the hidden layer outputs at different times and then uses weighted summation to obtain an RNN context vector. The latter can be thought of as assigning different attention weights to various dimensions of the output vector [38,39]. Thus, this paper proposes the attention mechanism based on BiGRU to address the issues of reduced prediction accuracy due to the extended time span in the water-level-prediction task and insufficient spatial information utilization in the water-level-prediction task.

Particularly, this paper ingeniously combines the spatial-reduction attention (SRA) structure with BiGRU, inspired by the Pyramid vision transformer (PVT) model in the field of computer vision [40,41]. Comparable to a multi-head attention mechanism, the SRA structure receives Q, K, and V as input and outputs refined features. It refers to the fact that SRA reduces the spatial scale of K and V prior to attention operation, thereby drastically reducing the computation and memory requirements of multi-head attention mechanisms. The structure of SRA is depicted in Figure 6 [41], and the structure of SRA during phase i is depicted below [41].

SRA (Q, K, V) = {Concat (head}_{0}, \dots, {head}_{N_{i}}) W^{O}

(8)

{head}_{j} = {Attention (QW}_{j}^{Q}, SR (K) W_{j}^{K}, SR (V) W_{j}^{V})

(9)

where

Concat (\cdot)

is the operation of concatenation.

W_{j}^{Q} \in R^{C_{i} \times d_{head}}

,

W_{j}^{V} \in R^{C_{i} \times d_{head}}

, and

W^{O} \in R^{C_{i} \times C_{i}}

are parameters for linear projection. N_i is the head number of the Stage i attention layer. Therefore, each head’s dimension is equal to

\frac{C_{i}}{N_{i}}

. The notation for

SR (\cdot)

, which reduces the spatial dimension of the input sequence, is as follows:

SR(x) = Norm (Reshape (x, R_i) W^S)

(10)

W_{S} \in R^{(R_{i}^{2} C_{i}) \times C_{i}}

is a linear projection that reduces the dimension of the input sequence to C_i,

Norm (\cdot)

is the same as the original transformer, and

Attention (\cdot)

is calculated as follows:

Attention (q, k, v) = Softmax (\frac{q k^{T}}{\sqrt{d_{head}}} v)

(11)

2.5. Overall Model

The SRA-BiGRU model (see Figure 7) is proposed as an easy-to-use method for solving the problems of insufficient spatial information utilization and long time span in water-level-prediction tasks by integrating the benefits of BiGRU and spatial-reduction attention.

Specifically, because the water level is a continuously changing quantity in the time dimension and is closely related to the historical state, it is preferable to use GRU to enable the water-level information to be memorized and transmitted in time sequence, thereby realizing the concept of time series, and on this basis, the bidirectional RNN structure enables to build the prospective relationship between past water-level status and future water-level status with the current state. Therefore, BiGRU is more adaptable and can link multiple influencing factors to changes in water level.

However, in the BiGRU-based water-level-prediction model, what is used is a fixed contextual vector generated from the inputs. Due to the limited length of the vector, it is difficult to summarize the information of the whole water-level sequence, and the information entered first in BiGRU is diluted to some extent by the later inputs. Therefore, as the input sequence becomes longer, this fixed context vector becomes less and less able to reflect the real information in the water-level data. In addition, the fixed context vector alone cannot distinguish the degree of correlation between the output sequence and the hidden layer of the input sequence across time synchronization. Moreover, the water-level-prediction model based on BiGRU is affected by the inadequacy of spatial information’s response capability, so it cannot use spatial information between water-level-measurement stations.

Consequently, introducing the spatial-reduction attention based on the BiGRU can automatically learn the correlation of each hidden vector it generates, effectively resolving the problem of accuracy degradation caused by the extended time period in the water-level-prediction task. In addition, considering that the spatial distribution of upstream water-level stations with their water level, flow velocity and climate information will have an impact on the water-level-prediction task, the attention mechanism can set the attention weight for each feature so as to calculate the impact of the information of the upstream water-level stations on the water-level prediction in the future moment with its own water-level station as the center. It is worth mentioning that the spatial-reduction attention reduces the computational and memory overhead of the multi-headed attention mechanism due to its unique structure.

3. Comparative Experiments and Results

3.1. Data Processing

Due to the accuracy of measurement and the fluctuation of the external environment, the original dataset contains a small quantity of poor-quality historical water-level data. For time series data, data quality has a significant impact on prediction accuracy, which is directly related to model availability [35,42]. Thus, data quality issues such as data duplication, data missing, and data error, which are common in water level observations, should be addressed prior to model training [8].

Among them, the data processing method for two or more duplicate data with the same time stamp and the same water-level information at the same water-level station was to traverse the duplicate items and delete them. Due to the small number of original data missing values and the discontinuous distribution, the average value method was used to fill in missing data. This experiment took the average of the water levels at six points surrounding the empty item, such as x₁, x₂, x₃, x₄, x₅, and x₆, where x₄ was the empty item, and the filling of x₄ was (x₁ + x₂ + x₃ + x₅ + x₆ + x₇)/6. Finally, data outliers such as extremely large and small water-level-observation values that clearly deviate from the average level of the series were eliminated and filled in using the data missing processing method [39]. Furthermore, to hasten the convergence of the proposed model and improve its accuracy, maximum–minimum normalization was used so that all values are compressed within the interval [0, 1] [30].

x^{*} = \frac{x - x_{\min}}{x_{\max} - x_{\min}}

(12)

The processed dataset was divisible into a training set, a test set, and a validation set. Specifically, the training set aids in training the model to determine the weight parameters of each network connection. The test set is an independent dataset with similar characteristics to the training set that is used to evaluate the trained network model’s performance. In addition, since the training of the model is based solely on the training set, it is impossible to determine whether the model will perform well on the test set. To preserve the scientific integrity of the study, the test set cannot participate in the training. Therefore, the validation set is required to aid in the training of the model, provide feedback on the model’s performance via the validation set, and continuously optimize and improve the model’s parameters [30]. In this comparative experiments, 30% of the training set data was divided separately as a validation set to aid in the training of the model.

3.2. Hyperparametric Optimization

Hyperparameters, which are outside-the-model control variables, play a critical role in recurrent neural networks. The following hyperparameters of model training will be adjusted and optimized one by one. Specifically, in this experiment, the mean squared error (MSE) was selected as the loss function of the proposed model to quantify the difference between the predicted and actual water level [30]. The Adam optimizer was chosen because it can not only adapt to sparse gradient but also mitigate gradient oscillation [30]. The initial network learning rate was set to 0.002, and the rate was lowered adaptively with each training cycle. Gradually reducing the learning rate improves the performance of the model [8]. The number of units was selected based on the dimensions of model input and output, model structure, problem complexity, and other considerations. Significantly, the number of units is insufficient, which may reduce the capacity of the model for learning and increases training and generalization errors. In contrast, if the number of units is excessively big, it is simple for overfitting to occur and the generalization error to increase [8]. The batch size is the quantity of training samples contained in each batch, and the batch sizes chosen for this investigation were {32, 64, 128, 256} [8]. The model divides the training dataset into batches according to batch size. Going through an epoch means that all batches have been trained once, which means that each sample in the training dataset has the opportunity to update the model’s internal parameters. The advancement of the epoch value from small to large can represent the evolution of a network from underfitting to fitting and to overflowing. In accordance with the principle of coarse to fine, the epoch chosen for this experiment was 250.

3.3. Evaluation Index

In order to evaluate the accuracy and performance of various water-level-prediction models, this paper has chosen three evaluation indexes: NSE, RMSE, and MAE. NSE is commonly used to assess the predictive capability of hydrological models. A value close to 1 indicates that the model is of high quality and reliability. If the NSE value is close to 0, it implies that the simulated results are close to the average of the observed values, but the process simulation error is significant. On the other hand, if the NSE value is far less than 0, the model is unreliable. RMSE and MAE can be used to measure the difference between the observed and predicted values. The lower the values of these evaluation metrics, the higher the average prediction accuracy of the model. The calculation process of these evaluation indexes are as follows:

NSE = 1 - \frac{\sum_{i = 1}^{m} (y_{i} - {\hat{y}}_{i})^{2}}{\sum_{i = 1}^{m} (y_{i} - \bar{y})^{2}}

(13)

RMSE = \sqrt{\frac{1}{m} \sum_{i = 1}^{m} {(y_{i} - {\hat{y}}_{i})}^{2}}

(14)

MAE = \frac{1}{m} \sum_{i = 1}^{m} | (y_{i} - {\hat{y}}_{i}) |

(15)

where m is the total amount of test samples, y_i represents the actual value of water level,

{\hat{y}}_{i}

represents the predicted water level, and

\bar{y}

is the average value of actual values.

3.4. Results of Comparative Experiments

3.4.1. Comparative Experiment Results Based on the Bidirectional RNN Structure

In the comparison experiment based on bidirectional RNN structure, BiGRU, GRU, BiLSTM, and LSTM were used to compare prediction results. Because the loss function is smaller, the model is more accurate and robust. As depicted in Figure 8, the performance of these models is fully illustrated below: BIGRU outperforms GRU, and BiLSTM outperforms LSTM, demonstrating the effectiveness of bidirectional RNN structure.

3.4.2. Comparative Experiment Results Based on the Spatial-Reduction Attention Mechanism

As illustrated in Figure 9, whether GRU and BiGRU or LSTM and BiLSTM, the incorporation of the spatial-reduction attention can significantly improve model robustness and water-level-prediction accuracy. This also confirms that the spatial reduction attention can actively learn the correlation of hidden vectors of RNN and BiRNN and highlight the influence of important features on the prediction results, thereby resolving the problems of insufficient utilization of spatial information and long time span in water-level-prediction tasks, which result in a decline in prediction accuracy.

3.4.3. Overall Comparative Experiment Results

According to Figure 10, Figure 11 and Figure 12 and Table 1, all these methods perform well on the test dataset, and the water-level-prediction results are reasonably accurate, indicating that all of these models enable to deal with the problem of water-level prediction effectively. As noted previously, the fewer the loss value and evaluation indexes, the closer the predicted value matches the actual value, and hence the more accurate the prediction. Thus, the accuracy of these models is ranked in the following order from high to low: SRA-BiGRU, SRA-BiLSTM, SRA-GRU, BiGRU, BiLSTM, SRA-LSTM, GRU, and LSTM. Notably, despite the fact that LSTM and GRU perform comparably in a number of time-series-prediction tasks, the accuracy and effectiveness of GRU in this experiment were superior to those of LSTM regardless of whether the model or the combination of BiRNN and spatial-reduction attention is considered. Most crucial of all, the SRA-BiGRU model has the lowest values for loss value, RMSE, and MAE, and the NSE value is closest to 1, indicating that its applicability, availability, and precision are superior to those of other comparison experiments, conclusively demonstrating that the bidirectional RNN structure and spatial-reduction attention are effective.

4. Discussion

Most power stations in the mainstream of Wujiang River use peak-shaving operations several times a day. The unsteady flow discharged from each hydropower station causes the water level in the channel under the dam to rise and fall suddenly. The instantaneous amplitude changes frequently, even in the extreme situation where the water level rises by 20 m a day [35]. These issues, such as abrupt changes in water level, sudden increases in velocity, and intricate flow patterns, not only extend the operational cycle of navigable ships but also increase their operational costs, posing certain risks to the safety of ship transportation and flood control of cities along the route. Consequently, the precise prediction of water level is of immense importance for flood control, rational ship stowage, and secure navigation of ships in the Wujiang River Basin. By gradually introducing the applications of GRU, bidirectional RNN structure, and spatial-reduction attention mechanism, an easy-to-use model appropriate for precise and quick water-level-prediction is proposed, and comparative experiment results further prove that the proposed SRA-BiGRU model has higher prediction accuracy in the water-level-prediction task.

GRU and LSTM, both excellent variants of the recurrent neural networks, can use their strong fitting ability in capturing nonlinear characteristics, and both fully consider the time series of water-level data [23,24,25,26,27,39]. Because these two models have indistinguishable effects on many tasks, this paper conducted comparative experiments using LSTM and GRU, and the results of comparative experiments indicate that GRU outperforms LSTM in terms of both efficiency and accuracy irrespective of whether the model is combined with a bidirectional RNN structure, the spatial-reduction attention mechanism, or neither. In spite of the fact that GRU outperforms LSTM in this experiment, there is no definitive conclusion as to which is superior, and the best model must be chosen based on the specific tasks and datasets at hand.

However, LSTM and GRU cannot encode information from back to front and can only predict the output of the next time based on timing information from the previous time, whereas BiLSTM and BiGRU can encode information from back to front, and the output is determined by the previous and future states [30,36]. Due to the complex structure of the bidirectional RNN structure, its running time will increase proportionally, but it can be viewed as a technique for enhancing precision [37]. Moreover, comparative experiments confirm the efficacy of the bidirectional structure by demonstrating that whether combined with LSTM or GRU, the bidirectional RNN structure can dramatically improve the accuracy of water-level prediction. Therefore, it gives a conclusion that BiGRU enables the modeling of the potential relationship between past and future water-level data and current data, thereby improving the accuracy of forecasts.

Targeting the time-series-prediction task, compared with using RNN and its variants alone, some research work has combined the RNN model prediction output with the attention mechanism in deep learning. These research findings indicate that this approach of rapidly selecting high-value information from a large amount of data can significantly enhance the accuracy of time-series prediction [30,31,32]. However, this method is rarely applied in the field of water-level prediction. In terms of water-level prediction, the water level is affected not only by historical water levels but also by upstream water levels, indicating that river water-level data are spatially and temporally related. However, the water-level-prediction model based on BiGRU is plagued by a lack of spatial information-response capacity. Consequently, to address the issues of reduced prediction accuracy due to the extended time span in the water-level-prediction task and insufficient spatial information utilization. Particularly, the selection of spatial-reduction attention reduces the computational and memory overhead of the multihead attention mechanism owing to its distinctive structure [41].

Furthermore, the results of comparative experiments based on the spatial-reduction attention mechanism demonstrate the benefit of combining the spatial-reduction attention mechanism with RNN and BiRNN. Therefore, it is clearly demonstrated that the incorporation of spatial-reduction attention can not only automatically capture the correlations between the hidden vectors generated by BiGRU but can also effectively address the issue of precision degradation due to the extended time span in water-level-forecasting tasks. Furthermore, it can take into account the impact of the spatial distribution of upstream water level stations on the water-level-prediction task and assign attention weights to each feature, thereby calculating the influence of upstream water-level station information on the future water-level prediction with its own water-level station as the focal point.

Most crucial of all, comparative experiments gradually demonstrate the superiority of GRU, BI structure, and attention mechanism, and all evaluation indices confirm that the proposed SRA-BiGRU model has higher prediction accuracy in the water-level-prediction task, indicating that it is a model with high availability, high accuracy, and high robustness.

However, this study has some weaknesses, which are embodied in the fact that it is based on the project construction of the Wujiang River water-safety-prediction system, so the dataset obtained has few characteristics and does not fully demonstrate the model’s performance, and in the subsequent step, the model can be applied to other water-level-prediction datasets, and meteorological data such as precipitation, water temperature, and temperature can be added to enhance the model’s ability to predict the water level under abnormal climate conditions.

5. Conclusions

This paper proposed a high-precision water-level-prediction model that combines the superiorities of GRU, bidirectional RNN structure, and spatial-reduction attention in order to fully utilize the valuable information contained in massive amounts of Wujiang River hydrological data and to address insufficient utilization of spatial information and long time span in water-level-prediction tasks. Through comparative experiments and discussions, the following concise conclusions can be drawn:

GRU and LSTM, both excellent variants of the recurrent neural networks, can use their strong fitting ability in capturing nonlinear characteristics and fully consider the time series of water-level data. Moreover, in this experiment, GRU outperforms LSTM in terms of water-level-prediction accuracy and training speed;
The bidirectional GRU structure enables the modeling of the potential relationship between past and future water-level data and current data, thereby improving the accuracy of prediction;
The introduction of spatial-reduction attention based on BiGRU can actively learn the correlation of hidden vectors of BiGRU and highlight the influence of important features on the prediction results, thereby solving the problems of insufficient utilization of spatial information and long time span in water-level-prediction tasks, which lead to the decline of prediction accuracy. Particularly, due to its unique structure, spatial-reduction attention reduces the overhead of multi-head attention mechanism computation and memory;
All evaluation index values of comparative experiments confirm that the SRA-BiGRU model has higher prediction accuracy in the water-level-prediction task, indicating that it is a high availability, high accuracy, and high robustness water-level-prediction model.

In the future, the SRA-BiGRU model can be applied to water-level-prediction datasets in conjunction with meteorological data such as precipitation, water temperature, and temperature to improve the model’s ability to predict the water level under atypical climate conditions.

Author Contributions

Conceptualization, K.B.; methodology, K.B. and J.B.; software, K.B.; validation, J.B., W.Z. and Y.S.; formal analysis, K.B.; investigation, J.B.; resources, Y.S.; data curation, R.M.; writing—original draft preparation, K.B.; writing—review and editing, Y.S.; visualization, Y.W. and R.M.; supervision, K.B.; project administration, R.M. and J.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by (National Key R&D Program of China) grant number (No. 2022YFB3207400) and (Fundamental Research Funds for the Guangxi science and technology agency) grant number (No. 2021AB07045 and No. 2021AB05087). Additionally, The APC was funded by (Basic Research Fund of Central-Level Nonprofit Scientific Research Institutes, No. TKS20210301).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets analyzed or generated in this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Guo, Y.H.; Liu, J.L.; Liu, M.J. Research on the relationship between water level and channel navigability. J. Wuhan Univ. Technol. 2010, 34, 842–844. [Google Scholar]
Liu, M.J.; Gong, D.P. Analysis on changes of the Changjiang navigation environment. Mar. Eng. 2004, 2, 36–39. [Google Scholar]
Xiao, Z.Y.; Chen, Y.L.; Chen, C.H. Web-oriented “one-stop” service system for Yangtze River hydrological data. People’s Yangtze River 2018, 49, 111–116. [Google Scholar] [CrossRef]
Sang, L.Z.; Mao, Z.; Zhang, W.J.; Yan, X.P. Realization of early warning system for safe navigation of ships in multi-bridge waters of inland rivers. China Navig. 2014, 37, 34–39. [Google Scholar]
Feng, N.C.; Chen, X.N.; Wu, J.B. Construction and operation of water surface evaporation monitoring station in Danjiangkou Reservoir. People’s Yangtze River 2018, 49, 29–34. [Google Scholar] [CrossRef]
Wu, Q.; Mei, J.Y.; Du, Y.D.; Yuan, H. Practice and understanding of water resources monitoring in the Yangtze River basin. People’s Yangtze River 2017, 48, 12–15. [Google Scholar] [CrossRef]
Bian, N. Application of improved gray system in prediction of channel water level. China Water Transp. Channel Technol. 2018, 5, 75–80. [Google Scholar] [CrossRef]
Pan, M.; Zhou, H.; Cao, J.; Liu, Y.; Hao, J.; Li, S.; Chen, C.H. Water level prediction model based on GRU and CNN. IEEE Access 2020, 8, 60090–60100. [Google Scholar] [CrossRef]
Pierini, N.A.; Vivoni, E.R.; Robles-Morua, A.; Scott, R.L.; Nearing, M.A. Using observations and a distributed hydrologic model to explore runoff thresholds linked with mesquite encroachment in the Sonoran Desert. Water Resour. Res. 2014, 50, 8191–8215. [Google Scholar] [CrossRef] [Green Version]
Kaya, C.M.; Tayfur, G.; Gungor, O. Predicting flood plain inundation for natural channels having no upstream gauged stations. J. Water Clim. Chang. 2019, 10, 360–372. [Google Scholar] [CrossRef] [Green Version]
Lai, X.J.; Jiang, J.H.; Huang, Q. The impact pattern and mechanism of the water storage of the Three Gorges Project on the water regime of Dongting Lake. Lake Sci. 2012, 24, 178–184. [Google Scholar]
Zhou, H.; Mao, D.H.; Liu, P.L. Analysis of the influence of the operation of the Three Gorges on the water level of East Dongting Lake. Mar. Limnol. Bull. 2014, 4, 180–186. [Google Scholar] [CrossRef]
Zhou, B.D.; Liu, Y.H. Influence of Three Gorges Operation on flood level of Dongting Lake. Hunan Water Conserv. Hydropower 2003, 1, 27–29. [Google Scholar]
Liang, C. Water level prediction of Dongting Lake Based on Long-Short-Term Memory Network and the Impact of the Three Gorges Project on the Water Level of Dongting Lake. Ph.D. Thesis, Wuhan University, Wuhan, China, 2019. [Google Scholar]
Wang, C.H. Application of wavelet neural network model in groundwater prediction. Water Sci. Eng. Technol. 2016, 3, 44–46. [Google Scholar] [CrossRef]
Adamowski, J.; Chan, H.F. A wavelet neural network conjunction model for groundwater level forecasting. J. Hydrol. 2011, 407, 28–40. [Google Scholar] [CrossRef]
Seo, Y.; Kim, S.; Kisi, O.; Singh, V.P. Daily water level forecasting using wavelet decomposition and artificial intelligence techniques. J. Hydrol. 2015, 520, 224–243. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Behzad, M.; Asghari, K.; Coppola, E.A. Comparative study of SVMs and ANNs in aquifer water level prediction. J. Comput. Civ. Eng. 2010, 24, 408–413. [Google Scholar] [CrossRef]
Dai, L.; Mao, J.; Wang, Y.; Dai, H.; Zhang, P.; Guo, J. Optimal operation of the Three Gorges Reservoir subject to the ecological water level of Dongting Lake. Environ. Earth Sci. 2016, 75, 1111. [Google Scholar] [CrossRef]
Zakaria, M.N.A.; Abdul Malek, M.; Zolkepli, M.; Najah Ahmed, A. Application of artificial intelligence algorithms for hourly river level forecast: A case study of Muda River, Malaysia. Alex. Eng. J. 2021, 60, 4015–4028. [Google Scholar] [CrossRef]
Ren, T.; Liu, X.; Niu, J.; Lei, X.; Zhang, Z. Real-time water level prediction of cascaded channels based on multilayer perception and recurrent neural network. J. Hydrol. 2020, 585, 124783. [Google Scholar] [CrossRef]
Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
Potter, C. RNN based MIMO channel prediction. In Differential Evolution in Electromagnetics. Evolutionary Learning and Optimization; Qing, A., Lee, C.K., Eds.; Springer: Berlin/Heidelberg, Germany, 2020; Volume 4, pp. 177–206. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Gers, F.A.; Schmidhuber, J.; Cummins, F. Learning to forget: Continual prediction with LSTM. Neural Comput. 2000, 12, 2451–2471. [Google Scholar] [CrossRef]
Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar] [CrossRef]
Le, X.H.; Ho, H.V.; Lee, G.; Jung, S. Application of long short-term memory (LSTM) neural network for flood forecasting. Water 2019, 11, 1387. [Google Scholar] [CrossRef] [Green Version]
Xu, G.Y.; Zhou, X.Y.; Si, C.Y.; Hu, W.B.; Liu, F. Water level time series prediction model based on GRU and LightGBM feature selection. Comput. Appl. Softw. 2020, 37, 25–31. [Google Scholar]
Bao, K.; Bi, J.; Gao, M.; Sun, Y.; Zhang, X.; Zhang, W. An improved ship trajectory prediction based on AIS data using MHA-BiGRU. J. Mar. Sci. Eng. 2022, 10, 804. [Google Scholar] [CrossRef]
Li, X.; Lu, X.L. Short-term load forecasting model based on dual attention mechanism and GRU network. Comput. Eng. 2022, 48, 291–296. [Google Scholar] [CrossRef]
Li, H.J.; Fang, X.; Dai, H.R. Deep knowledge tracking optimization model based on self-attention mechanism and bidirectional GRU neural network. Appl. Res. Comput. 2022, 39, 732–738. [Google Scholar] [CrossRef]
Hu, H.X.; Sui, H.C.; Hu, Q.; Zhang, Y.; Hu, Z.Y.; Ma, N.W. Runoff forecasting model based on graph attention network and two-stage attention mechanism. Comput. Appl. 2022, 42, 1607–1615. [Google Scholar]
Sudriani, Y.; Ridwansyah, I.; Rustini, H.A. Long short term memory (LSTM) recurrent neural network (RNN) for discharge level prediction and forecast in Cimandiri river, Indonesia. Earth Environ. Sci. 2019, 299, 012037. [Google Scholar] [CrossRef]
Ma, R.X.; Zhao, P.; Zhu, J.; Li, Z.L. Development and application of Wujiang water safety prediction and early warning system. People’s Yangtze River 2019, 50, 211–216. [Google Scholar] [CrossRef]
Zhang, G.; Tan, F.; Wu, Y. Ship motion attitude prediction based on an adaptive dynamic particle swarm optimization algo-rithm and bidirectional LSTM neural network. IEEE Access 2020, 8, 90087–90098. [Google Scholar] [CrossRef]
Agarap, A.F.; Grafilon, P. Statistical analysis on e-commerce reviews, with sentiments classification using bidirectional recurrent neural network (RNN). arXiv 2018, arXiv:1805.03687. [Google Scholar] [CrossRef]
Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. Statistics 2014, 3, 62–68. [Google Scholar]
Cho, M.; Kim, C.; Jung, K.; Jung, H. Water level prediction model applying a long short-term memory (lstm)–gated recurrent unit (gru) method for flood prediction. Water 2022, 14, 2221. [Google Scholar] [CrossRef]
Kingma, D.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980v9. [Google Scholar] [CrossRef]
Wang, W.; Xie, E.; Li, X.; Fan, D.P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 548–558. [Google Scholar]
Cryer, J.D.; Chan, K.-S. Time Series Analysis: With Applications in R; Springer: New York, NY, USA, 2008. [Google Scholar]

Figure 1. Flowchart of the water-level-prediction method.

Figure 2. The location of study area.

Figure 3. The distribution location of water-level-measurement stations.

Figure 4. Comparison of RNN, LSTM, and GRU neural network structures.

Figure 5. Bidirectional gate recurrent unit.

Figure 6. Spatial-Reduction Attention Mechanism.

Figure 7. Schematic diagram of the SRA-BiGRU model.

Figure 8. Loss value of comparative experiment based on the BiRNN structure. (a) BiGRU vs. GRU; (b) BiLSTM vs. LSTM.

Figure 9. Loss value of comparative experiment based on the spatial-reduction attention. (a) SRA-BiGRU vs. BiGRU; (b) SRA-LSTM vs. BiLSTM; (c) SRA-GRU vs. GRU; (d) SRA-LSTM vs. LSTM.

Figure 10. Loss value of all comparative experiments.

Figure 11. Water-level prediction.

Figure 12. Evaluation index value of all models.

Table 1. Evaluation index value of all models.

Methods	MAE	RMSE	NSE
SRA-BiGRU	0.54383	0.69723	0.91097
SRA-BiLSTM	0.55497	0.71105	0.90324
SRA-GRU	0.72584	0.93057	0.88365
BiGRU	0.80069	1.02653	0.86651
BiLSTM	0.87764	1.12519	0.85985
SRA-LSTM	0.94777	1.21509	0.84885
GRU	1.32214	1.69505	0.83982
LSTM	1.41426	1.81302	0.83179

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bao, K.; Bi, J.; Ma, R.; Sun, Y.; Zhang, W.; Wang, Y. A Spatial-Reduction Attention-Based BiGRU Network for Water Level Prediction. Water 2023, 15, 1306. https://doi.org/10.3390/w15071306

AMA Style

Bao K, Bi J, Ma R, Sun Y, Zhang W, Wang Y. A Spatial-Reduction Attention-Based BiGRU Network for Water Level Prediction. Water. 2023; 15(7):1306. https://doi.org/10.3390/w15071306

Chicago/Turabian Style

Bao, Kexin, Jinqiang Bi, Ruixin Ma, Yue Sun, Wenjia Zhang, and Yongchao Wang. 2023. "A Spatial-Reduction Attention-Based BiGRU Network for Water Level Prediction" Water 15, no. 7: 1306. https://doi.org/10.3390/w15071306

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Spatial-Reduction Attention-Based BiGRU Network for Water Level Prediction

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area Description

2.2. Comparison of RNN Variations

2.3. Bidirectional RNN Structure

2.4. Spatial-Reduction Attention

2.5. Overall Model

3. Comparative Experiments and Results

3.1. Data Processing

3.2. Hyperparametric Optimization

3.3. Evaluation Index

3.4. Results of Comparative Experiments

3.4.1. Comparative Experiment Results Based on the Bidirectional RNN Structure

3.4.2. Comparative Experiment Results Based on the Spatial-Reduction Attention Mechanism

3.4.3. Overall Comparative Experiment Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI