Traditional Statistics vs. Modern Machine Learning Approaches in Hydrology

A special issue of Hydrology (ISSN 2306-5338). This special issue belongs to the section "Statistical Hydrology".

Deadline for manuscript submissions: closed (15 December 2023) | Viewed by 19483

Special Issue Editor

National Observatory of Athens, Institute for Environmental Research & Sustainable Development, Athens, Greece
Interests: urban water; hydrology; water resources management; integrated modelling; numerical methods
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

We are pleased to announce the launching of a new Special Issue: "Traditional Statistics vs. Modern Machine Learning Approaches in Hydrology". We are looking forward to receiving your research papers or/and case studies.

The effectiveness of machine learning (ML), and more recently deep learning, in hydrological applications has been proven by many researchers. The first applications of ML in hydrology appeared almost 30 years ago and, despite their simplicity, were fairly efficient, though they fell short of the performance of the standard models. Since then, things have evolved, and nowadays, many studies regarding ML applications in engineering hydrology suggest that ML can not only outperform hydrological models but can also learn catchment similarities, which corresponds to a prior hydrological understanding. While the effectiveness of ML in engineering hydrology is not questioned, there is no strong evidence that ML methods can dominate the traditional approaches in statistical hydrology. Indeed, some researchers have found that stochastic and machine learning methods do not differ that dramatically in forecasting hydrological processes such as river discharge. This finding has some implications since the CPU intensity of the ML methods is a significant handicap. Another example is the work by researchers who consider the heuristic segmentation approach to be unparalleled in applications such as the construction of rating curves, double mass analysis, and time-shift detection, etc. Yet, the equivalent ML methods (unsupervised learning), in contrast to the heuristic methods, come with plenty of readily available tools and frameworks that require only minimal configuration. The scope of this Special Issue involves the comparative use of traditional and ML approaches in applications of statistical hydrology. To shed more light, this comparison should be comprehensive, taking into consideration not only the performance per se, but also the preparation stages. More specifically, the evaluation should take into account and discuss the theoretical background, the labour required to configure the model, the CPU time required to set up the model, and finally, its overall appeal to practitioners.

Dr. Evangelos Rozos
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Hydrology is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • statistical hydrology
  • machine learning
  • unsupervised learning
  • heuristic algorithms
  • logistic regression
  • linear regression
  • hydrological modelling

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

23 pages, 6733 KiB  
Article
Enhancing Flood Prediction Accuracy through Integration of Meteorological Parameters in River Flow Observations: A Case Study Ottawa River
by Clara Letessier, Jean Cardi, Antony Dussel, Isa Ebtehaj and Hossein Bonakdari
Hydrology 2023, 10(8), 164; https://doi.org/10.3390/hydrology10080164 - 10 Aug 2023
Cited by 3 | Viewed by 1379
Abstract
Given that the primary cause of flooding in Ontario, Canada, is attributed to spring floods, it is crucial to incorporate temperature as an input variable in flood prediction models with machine learning algorithms. This inclusion enables a comprehensive understanding of the intricate dynamics [...] Read more.
Given that the primary cause of flooding in Ontario, Canada, is attributed to spring floods, it is crucial to incorporate temperature as an input variable in flood prediction models with machine learning algorithms. This inclusion enables a comprehensive understanding of the intricate dynamics involved, particularly the impact of heatwaves on snowmelt, allowing for more accurate flood prediction. This paper presents a novel machine learning approach called the Adaptive Structure of the Group Method of Data Handling (ASGMDH) for predicting daily river flow rates, incorporating measured discharge from the previous day as a historical record summarizing watershed characteristics, along with real-time data on air temperature and precipitation. To propose a comprehensive machine learning model, four different scenarios with various input combinations were examined. The simplest model with three parameters (maximum temperature, precipitation, historical daily river flow discharge) achieves high accuracy, with an R2 value of 0.985 during training and 0.992 during testing, demonstrating its reliability and potential for practical application. The developed ASGMDH model demonstrates high accuracy for the study area, with a significant number of samples having a relative error of less than 15%. The final ASGMDH-based model has only a second-order polynomial (AICc = 19,648.71), while it is seven for the classical GMDH-based model (AICc = 19,701.56). The sensitivity analysis reveals that maximum temperature significantly impacts the prediction of daily river flow discharge. Full article
Show Figures

Figure 1

18 pages, 29981 KiB  
Article
A Machine-Learning Framework for Modeling and Predicting Monthly Streamflow Time Series
by Hatef Dastour and Quazi K. Hassan
Hydrology 2023, 10(4), 95; https://doi.org/10.3390/hydrology10040095 - 17 Apr 2023
Cited by 1 | Viewed by 1732
Abstract
Having a complete hydrological time series is crucial for water-resources management and modeling. However, this can pose a challenge in data-scarce environments where data gaps are widespread. In such situations, recurring data gaps can lead to unfavorable outcomes such as loss of critical [...] Read more.
Having a complete hydrological time series is crucial for water-resources management and modeling. However, this can pose a challenge in data-scarce environments where data gaps are widespread. In such situations, recurring data gaps can lead to unfavorable outcomes such as loss of critical information, ineffective model calibration, inaccurate timing of peak flows, and biased statistical analysis in various applications. Despite its importance, predicting monthly streamflow can be a complex task due to its connection to random dynamics and uncertain phenomena, posing significant challenges. This study introduces an ensemble machine-learning regression framework for modeling and predicting monthly streamflow time series with a high degree of accuracy. The framework utilizes historical data from multiple monthly streamflow datasets in the same region to predict missing monthly streamflow data. The framework selects the best features from all available gap-free monthly streamflow time-series combinations and identifies the optimal model from a pool of 12 machine-learning models, including random forest regression, gradient boosting regression, and extra trees regressor, among others. The model selection is based on cross-validation train-and-test set scores, as well as the coefficient of determination. We conducted modeling on 26 monthly streamflow time series and found that the gradient boosting regressor with bagging regressor produced the highest accuracy in 7 of the 26 instances. Across all instances, the models using this method exhibited an overall accuracy range of 0.9737 to 0.9968. Additionally, the use of either a bagging regressor or an AdaBoost regressor improved both the tree-based and gradient-based models, resulting in these methods accounting for nearly 80% of the best models. Between January 1960 and December 2021, an average of 40% of the monthly streamflow data was missing for each of the 26 stations. Notably, two crucial stations located in the economically significant lower Athabasca Basin River in Alberta province, Canada, had approximately 70% of their monthly streamflow data missing. To address this issue, we employed our framework to accurately extend the missing data for all 26 stations. These accurate extensions also allow for further analysis, including grouping stations with similar monthly streamflow behavior using Pearson correlation. Full article
Show Figures

Figure 1

15 pages, 1393 KiB  
Article
Assessing Hydrological Simulations with Machine Learning and Statistical Models
by Evangelos Rozos
Hydrology 2023, 10(2), 49; https://doi.org/10.3390/hydrology10020049 - 10 Feb 2023
Cited by 1 | Viewed by 1920
Abstract
Machine learning has been used in hydrological applications for decades, and recently, it was proven to be more efficient than sophisticated physically based modelling techniques. In addition, it has been used in hybrid frameworks that combine hydrological and machine learning models. The concept [...] Read more.
Machine learning has been used in hydrological applications for decades, and recently, it was proven to be more efficient than sophisticated physically based modelling techniques. In addition, it has been used in hybrid frameworks that combine hydrological and machine learning models. The concept behind the latter is the use of machine learning as a filter that advances the performance of the hydrological model. In this study, we employed such a hybrid approach but with a different perspective and objective. Machine learning was used as a tool for analyzing the error of hydrological models in an effort to understand the source and the attributes of systematic modelling errors. Three hydrological models were applied to three different case studies. The results of these models were analyzed with a recurrent neural network and with the k-nearest neighbours algorithm. Most of the systematic errors were detected, but certain types of errors, including conditional systematic errors, passed unnoticed, leading to an overestimation of the confidence of some erroneously simulated values. This is an issue that needs to be considered when using machine learning as a filter in hybrid networks. The effect of conditional systematic errors can be reduced by naively combining the simulations (mean values) of two or more hydrological models. This simple technique reduces the magnitude of conditional systematic errors and makes them more discoverable to machine learning models. Full article
Show Figures

Figure 1

14 pages, 13910 KiB  
Article
Machine Learning for Surrogate Groundwater Modelling of a Small Carbonate Island
by Karl Payne, Peter Chami, Ivanna Odle, David Oscar Yawson, Jaime Paul, Anuradha Maharaj-Jagdip and Adrian Cashman
Hydrology 2023, 10(1), 2; https://doi.org/10.3390/hydrology10010002 - 22 Dec 2022
Cited by 2 | Viewed by 2284
Abstract
Barbados is heavily reliant on groundwater resources for its potable water supply, with over 80% of the island’s water sourced from aquifers. The ability to meet demand will become even more challenging due to the continuing climate crisis. The consequences of climate change [...] Read more.
Barbados is heavily reliant on groundwater resources for its potable water supply, with over 80% of the island’s water sourced from aquifers. The ability to meet demand will become even more challenging due to the continuing climate crisis. The consequences of climate change within the Caribbean region include sea level rise, as well as hydrometeorological effects such as increased rainfall intensity, and declines in average annual rainfall. Scientifically sound approaches are becoming increasingly important to understand projected changes in supply and demand while concurrently minimizing deleterious impacts on the island’s aquifers. Therefore, the objective of this paper is to develop a physics-based groundwater model and surrogate models using machine learning (ML), which provide decision support to assist with groundwater resources management in Barbados. Results from the study show that a single continuum conceptualization is adequate for representing the island’s hydrogeology as demonstrated by a root mean squared error and mean absolute error of 2.7 m and 2.08 m between the model and observed steady-state hydraulic head. In addition, we show that data-driven surrogates using deep neural networks, elastic networks, and generative adversarial networks are capable of approximating the physics-based model with a high degree of accuracy as shown by R-squared values of 0.96, 0.95, and 0.95, respectively. The framework and tools developed are a critical step towards a digital twin that provides stakeholders with a quantitative tool for optimal management of groundwater under a changing climate in Barbados. These outputs will provide sound evidence-based solutions to aid long-term economic and social development on the island. Full article
Show Figures

Figure 1

19 pages, 7876 KiB  
Article
A Stacked Machine Learning Algorithm for Multi-Step Ahead Prediction of Soil Moisture
by Francesco Granata, Fabio Di Nunno, Mohammad Najafzadeh and Ibrahim Demir
Hydrology 2023, 10(1), 1; https://doi.org/10.3390/hydrology10010001 - 21 Dec 2022
Cited by 9 | Viewed by 2009
Abstract
A trustworthy assessment of soil moisture content plays a significant role in irrigation planning and in controlling various natural disasters such as floods, landslides, and droughts. Various machine learning models (MLMs) have been used to increase the accuracy of soil moisture content prediction. [...] Read more.
A trustworthy assessment of soil moisture content plays a significant role in irrigation planning and in controlling various natural disasters such as floods, landslides, and droughts. Various machine learning models (MLMs) have been used to increase the accuracy of soil moisture content prediction. The present investigation aims to apply MLMs with novel structures for the estimation of daily volumetric soil water content, based on the stacking of the multilayer perceptron (MLP), random forest (RF), and support vector regression (SVR). Two groups of input variables were considered: the first (Model A) consisted of various meteorological variables (i.e., daily precipitation, air temperature, humidity, and wind speed), and the second (Model B) included only daily precipitation. The stacked model (SM) had the best performance (R2 = 0.962) in the prediction of daily volumetric soil water content for both categories of input variables when compared with the MLP (R2 = 0.957), RF (R2 = 0.956) and SVR (R2 = 0.951) models. Overall, the SM, which, in general, allows the weaknesses of the individual basic algorithms to be overcome while still maintaining a limited number of parameters and short calculation times, can lead to more accurate predictions of soil water content than those provided by more commonly employed MLMs. Full article
Show Figures

Figure 1

21 pages, 4144 KiB  
Article
Exploring Temporal Dynamics of River Discharge Using Univariate Long Short-Term Memory (LSTM) Recurrent Neural Network at East Branch of Delaware River
by Md Abdullah Al Mehedi, Marzieh Khosravi, Munshi Md Shafwat Yazdan and Hanieh Shabanian
Hydrology 2022, 9(11), 202; https://doi.org/10.3390/hydrology9110202 - 11 Nov 2022
Cited by 17 | Viewed by 2531
Abstract
River flow prediction is a pivotal task in the field of water resource management during the era of rapid climate change. The highly dynamic and evolving nature of the climatic variables, e.g., precipitation, has a significant impact on the temporal distribution of the [...] Read more.
River flow prediction is a pivotal task in the field of water resource management during the era of rapid climate change. The highly dynamic and evolving nature of the climatic variables, e.g., precipitation, has a significant impact on the temporal distribution of the river discharge in recent days, making the discharge forecasting even more complicated for diversified water-related issues, e.g., flood prediction and irrigation planning. In order to predict the discharge, various physics-based numerical models are used using numerous hydrologic parameters. Extensive lab-based investigation and calibration are required to reduce the uncertainty involved in those parameters. However, in the age of data-driven predictions, several deep learning algorithms showed satisfactory performance in dealing with sequential data. In this research, Long Short-term Memory (LSTM) neural network regression model is trained using over 80 years of daily data to forecast the discharge time series up to seven days ahead of time. The performance of the model is found satisfactory through the comparison of the predicted data with the observed data, visualization of the distribution of the errors, and R2 value of 0.93 with one day lead time. Higher performance is achieved through the increase in the number of epochs and hyperparameter tuning. This model can be transferred to other locations with proper feature engineering and optimization to perform univariate predictive analysis and potentially be used to perform real-time river discharge prediction. Full article
Show Figures

Figure 1

16 pages, 6619 KiB  
Article
Development of Rating Curves: Machine Learning vs. Statistical Methods
by Evangelos Rozos, Jorge Leandro and Demetris Koutsoyiannis
Hydrology 2022, 9(10), 166; https://doi.org/10.3390/hydrology9100166 - 24 Sep 2022
Cited by 6 | Viewed by 2261
Abstract
Streamflow measurements provide valuable hydrological information but, at the same time, are difficult to obtain. For this reason, discharge records of regular intervals are usually obtained indirectly by a stage–discharge rating curve, which establishes a relation between measured water levels to volumetric rate [...] Read more.
Streamflow measurements provide valuable hydrological information but, at the same time, are difficult to obtain. For this reason, discharge records of regular intervals are usually obtained indirectly by a stage–discharge rating curve, which establishes a relation between measured water levels to volumetric rate of flow. Rating curves are difficult to develop because they require simultaneous measurements of discharge and stage over a wide range of stages. Furthermore, the shear forces generated during flood events often change the streambed shape and roughness. As a result, over long periods, the stage–discharge measurements are likely to form clusters to which different stage–discharge rating curves apply. For the identification of these clusters, various robust statistical approaches have been suggested by researchers, which, however, have not become popular among practitioners because of their complexity. Alternatively, various researchers have employed machine learning approaches. These approaches, though motivated by the time-dependent nature of the rating curves, handle the data as of stationary origin. In this study, we examine the advantages of a very simple technique: use time as one of the machine learning model inputs. This approach was tested in three real-world case studies against a statistical method and the results indicated its potential value in the development of a simple tool for rating curves suitable for practitioners. Full article
Show Figures

Figure 1

15 pages, 6849 KiB  
Article
KNN vs. Bluecat—Machine Learning vs. Classical Statistics
by Evangelos Rozos, Demetris Koutsoyiannis and Alberto Montanari
Hydrology 2022, 9(6), 101; https://doi.org/10.3390/hydrology9060101 - 06 Jun 2022
Cited by 4 | Viewed by 2329
Abstract
Uncertainty is inherent in the modelling of any physical processes. Regarding hydrological modelling, the uncertainty has multiple sources including the measurement errors of the stresses (the model inputs), the measurement errors of the hydrological process of interest (the observations against which the model [...] Read more.
Uncertainty is inherent in the modelling of any physical processes. Regarding hydrological modelling, the uncertainty has multiple sources including the measurement errors of the stresses (the model inputs), the measurement errors of the hydrological process of interest (the observations against which the model is calibrated), the model limitations, etc. The typical techniques to assess this uncertainty (e.g., Monte Carlo simulation) are computationally expensive and require specific preparations for each individual application (e.g., selection of appropriate probability distribution). Recently, data-driven methods have been suggested that attempt to estimate the uncertainty of a model simulation based exclusively on the available data. In this study, two data-driven methods were employed, one based on machine learning techniques, and one based on statistical approaches. These methods were tested in two real-world case studies to obtain conclusions regarding their reliability. Furthermore, the flexibility of the machine learning method allowed assessing more complex sampling schemes for the data-driven estimation of the uncertainty. The anatomisation of the algorithmic background of the two methods revealed similarities between them, with the background of the statistical method being more theoretically robust. Nevertheless, the results from the case studies indicated that both methods perform equivalently well. For this reason, data-driven methods can become a valuable tool for practitioners. Full article
Show Figures

Figure 1

20 pages, 7191 KiB  
Article
The Development of Explicit Equations for Estimating Settling Velocity Based on Artificial Neural Networks Procedure
by Muhammad Cahyono
Hydrology 2022, 9(6), 98; https://doi.org/10.3390/hydrology9060098 - 02 Jun 2022
Cited by 3 | Viewed by 1998
Abstract
This study proposes seven equations to predict the settling velocity of sediment particles with variations in grain size (d), particle shape factor (SF), and water temperature (T) based on the artificial neural network procedure. The data used [...] Read more.
This study proposes seven equations to predict the settling velocity of sediment particles with variations in grain size (d), particle shape factor (SF), and water temperature (T) based on the artificial neural network procedure. The data used to develop the equations were obtained from digitizing charts provided by the U.S. Interagency Committee on Water Resources (U.S-ICWR) and compiled from the measurement data of settling velocity from several sources. The equations are compared to three existing equations available in the literature and then analyzed using graphical and statistical analysis. The simulation results show the proposed equations produce satisfactory results. The proposed equations can predict the settling velocity of natural particle sediments, with diameters ranging between 0.05 mm and 10 mm in water with temperatures between 0 °C and 40 °C, and shape factor SF ranging between 0.5 and 0.95. Full article
Show Figures

Figure 1

Back to TopTop