Comparison of Deterministic and Probabilistic Variational Data Assimilation Methods Using Snow and Streamflow Data Coupled in HBV Model for Upper Euphrates Basin

Uysal, Gökçen; Alvarado-Montero, Rodolfo; Şensoy, Aynur; Şorman, Ali Arda

doi:10.3390/geosciences13030089

Open AccessArticle

Comparison of Deterministic and Probabilistic Variational Data Assimilation Methods Using Snow and Streamflow Data Coupled in HBV Model for Upper Euphrates Basin

¹

Department of Civil Engineering, Faculty of Engineering, 2 Eylul Campus, Eskişehir Technical University, Eskişehir 26555, Türkiye

²

Independent Researcher, Ottawa, ON K2B 8E2, Canada

^*

Author to whom correspondence should be addressed.

Geosciences 2023, 13(3), 89; https://doi.org/10.3390/geosciences13030089

Submission received: 9 January 2023 / Revised: 28 February 2023 / Accepted: 14 March 2023 / Published: 19 March 2023

(This article belongs to the Special Issue Progresses and Gaps on Monitoring of Snow and Its Components at the Local, Regional to Global Scale and Its Applications)

Download

Browse Figures

Versions Notes

Abstract

:

The operation of upstream reservoirs in mountainous regions fed by snowmelt is highly challenging. This is partly due to scarce information given harsh topographic conditions and a lack of monitoring stations. In this sense, snow observations from remote sensing provide additional and relevant information about the current conditions of the basin. This information can be used to improve the model states of a forecast using data assimilation techniques, therefore enhancing the operation of reservoirs. Typical data assimilation techniques can effectively reduce the uncertainty of forecast initialization by merging simulations and observations. However, they do not take into account model, structural, or parametric uncertainty. The uncertainty intrinsic to the model simulations introduces complexity to the forecast and restricts the daily work of operators. The novel Multi-Parametric Variational Data Assimilation (MP-VarDA) uses different parameter sets to create a pool of models that quantify the uncertainty arising from model parametrization. This study focuses on the sensitivity of the parametric reduction techniques of MP-VarDA coupled in the HBV hydrological model to create model pools and the impact of the number of parameter sets on the performance of streamflow and Snow Cover Area (SCA) forecasts. The model pool is created using Monte Carlo simulation, combined with an Aggregated Distance (AD) Method, to create different model pool instances. The tests are conducted in the Karasu Basin, located at the uppermost part of the Euphrates River in Türkiye, where snowmelt is a significant portion of the yearly runoff. The analyses were conducted for different thresholds based on the observation exceedance probabilities. According to the results in comparison with deterministic VarDA, probabilistic MP-VarDA improves the m-CRPS gains of the streamflow forecasts from 57% to 67% and BSS forecast skill gains from 52% to 68% when streamflow and SCA are assimilated. This improvement rapidly increases for the first additional model parameter sets but reaches a maximum benefit after 5 parameter sets in the model pool. The improvement is notable for both methods in SCA forecasts, but the best m-CRPS gain is obtained for VarDA (31%), while the best forecast skill is detected in MP-VarDA (12%).

Keywords:

HBV model; satellite snow data; data assimilation; model uncertainty; Euphrates River

1. Introduction

Snow is a vital component of the hydrological cycle in many mountainous snow-dominated basins where glaciers and seasonal snowpacks are the water supplier of more than 1/6 of the Earth’s population [1]. Proper estimation of hydrological response in these regions is crucial for the effective operation of the upstream reservoirs which function for hydropower, irrigation, and water supply. To that end, hydrological forecasting systems have been developed and applied for many decades. However, their representations are still challenging due to the complexities associated with snow accumulation and melting processes, the limitation of data availability in rough topography, and harsh weather conditions.

In order to improve the performances, hydrological forecasting for snow-dominated regions in literature copes with different aspects such as accounting for numerical weather prediction data [2,3,4], implementing different type of observations [5,6,7,8,9,10], and applying varying model approaches [11,12,13,14,15]. Observing snow by in situ measurements is traditional with ground-based meteorological networks. Still, it is challenging to have spatial variability and to access upper zones where snow cover stands for a more extended period [16]. Snow physics and harsh topography force modelers to employ advanced satellite and remote sensing tools to detect snow, such as Snow Cover Area (SCA) and Snow Water Equivalent (SWE) [17,18].

Model results in hydrological forecasting are not perfect due to uncertainties such as model input data, initial conditions, model structure, and parametrization [19,20,21]. To overcome these uncertainties, forecasting systems are subjected to several improvements. For example, forcing uncertainty may be coupled with ensemble numerical weather predictions [2] or integrating probabilistic approaches [22]. Data assimilation methods are useful techniques in many disciplines that help model developers decrease uncertainties by updating the model states and improving initial conditions. Data assimilation approaches are the most common way (often a prerequisite for forecasting) of adapting the observed information into the model to improve initial conditions [6,23,24,25,26].

Researchers have shown the advantages of including snow data in hydrological modeling studies for calibration and validation of degree-day based conceptual models [27]. Optimization of model parameters during calibration may force the model to compensate for rainfall-runoff error while sacrificing snow components. Data assimilation could increase the parsimony of the snow states, however, most of the existing implementations of data assimilation in hydrological forecasting have focused on the assimilation of streamflow. The model states which cannot be directly measured become more robust but still suffer from poor snow states and snow forecasts as the external output from the model itself. This is because the assimilation of snow data (SCA, SWE, or snow depth) is not as straightforward as compared to streamflow data. However, the integration of remote sensing snow data and ground snow measurements into data assimilation applications for an operational system is a relatively new field of research which is receiving growing attention [7,28,29,30,31,32]. This might be even more crucial for data-sparse snow-dominated regions; thus, studying and practicing data assimilation techniques with snow observations contributes to a better representation of models.

On the other hand, accounting for the uncertainty of the model structure (due to parametric and structural uncertainty) is very important since the parameters are generally calibrated and validated for a particular part of the selected historical period. Often, model sensitivity analysis provides useful information to detect the degree of uncertainty [33]. Additionally, this type of uncertainty can be reduced by using multi-modelling methods [2,34,35,36].

Data assimilation methods are mainly categorized into: (i) filtering approaches and (ii) smoothing approaches. The filtering approach (Extended Kalman Filter, Ensemble Kalman Filter, Particle Filter, etc.) is one of the most widely used way of updating states since it does not require adjoint modeling and is easy to implement [37]. In the smoothing approach, updates are done simultaneously over multiple time steps and they can be advantageous in handling time lags between observations and updated states. Variational Data Assimilation (VarDA) is a smoothing type of data assimilation implemented together with Moving Horizon Estimation (MHE) in which an objective function is numerically solved using optimization algorithms to obtain an optimum between model and observation uncertainties. It also provides flexible formulation without considering explicit error covariance matrices [38]. Alvarado et al. [29] introduced an innovative probabilistic version of VarDA, so-called Multi-Parametric VarDA (MP-VarDA), and showed that the new version provides more robust results compared to the conventional deterministic VarDA by assimilating both streamflow and SCA. The MP-VarDA consists of using multiple model parametric sets with the same model structure in order to create an ensemble of model runs that quantifies model parametric uncertainty. In the MP-VarDA, the noise introduced by the optimization algorithm is common for all model runs. This enables a unique modification of the model pool during the assimilation procedure. The ensemble of model runs, equivalent to the prior estimate in filtering methods, provides an estimate of the model parametric uncertainty. The model parameter sets are obtained from an Aggregated Distance (AD) method which maximizes the parametric distance between each model set in an effort to provide a maximum diversity in the quantification of parametric uncertainty.

This study aims to implement and test this recent method [29] that generates a probabilistic estimate of initial states using a multi-parametric modelling method. This study also tests the MP-VarDA method for the first time in a data-scarce snow-dominated region and the performances of the snow outputs are separately analyzed. Apart from that, according to the authors’ knowledge, the appropriate model pool number selection in MP-VarDA has not been tested in any study so far. Therefore, the main innovation of this study is to reveal the sensitivity of the number of models in a model pool using the MP-VarDA method coupled in the HBV model that is needed to capture model dynamics in data assimilation and compare it with its deterministic counterpart (VarDA).

Karasu catchment is a pilot basin for many national and international projects, with a high snowmelt contribution of around 60–70% in annual streamflow volume. Concerning the importance of snow in mountainous Eastern Türkiye and the limited availability of data, incorporating different data assimilation techniques with various snow data sources is crucial in the region’s streamflow predictions. Considering these, the companion paper discusses the conceptual hydrological model (HBV) with variational and sequential data assimilation techniques using satellite snow products as a case study in the Upper Euphrates Basin (Karasu), Türkiye and in the Canales Basin, Spain [28]. However, the uncertainties associated with model parametrization and their consideration in data assimilation techniques are still research questions for forecasting systems.

This study is also conducted in mountainous Eastern Türkiye for streamflow predictions over the region concerning the importance of snowmelt and the limited data availability. The study aims to clarify the effects of different model pool instances (mpi) in probabilistic variational data assimilation techniques in forecasting streamflow and snow cover area. The study is also valuable by accounting for snow data in assimilation configuration for a snow-dominated region. The objectives of the study are to reveal the value of MP-VarDA, investigate the effect of different model pool instances for capturing the model uncertainty, and finally compare the probabilistic MP-VarDA performances with deterministic VarDA.

2. Materials and Methods

A flowchart is presented in Figure 1 to demonstrate the methodology of the study. The daily rainfall-runoff relationship is established under the HBV model. The model has 21 parameters to mimic the streamflow observed at the outlet of the basin. The details of the hydrological model are given under Section 2.3. The structural uncertainty is represented by multiple model instances. To that end, the Monte Carlo simulation creates multiple model parameter sets (so-called mother model pool) around the previously calibrated and validated single model parameter set of the HBV model. Since it is not practical to use all parameterization, AD method was utilized to reduce the number of the model parameter instances. The reduced parameters are mainly split into two categories: (i) single instance model parameters and (ii) multiple instance model pool parameters.

The uncertainties of the initial conditions were tackled with the variational data assimilation technique. The benchmark simulation (noDA configuration) enables the detection of any gain obtained from the data assimilation implementation. The configuration of the assimilation accounts for perturbations over state forcings and observations. Conventional deterministic VarDA is also implemented using a single instance model parameter set. A multiple instance model pool is composed of various sets depending on the number of the instances such as 3, 5, etc. The HBV model is run within a hindcast period, and the comparisons of both techniques, together with the benchmark case, are demonstrated. The forecasts are produced as a set of ensembles for the MP-VarDA method. This is similar to data assimilation based hydrological ensemble forecasting done by numerical weather forecasts, but here forecast members are generated using multi-parametrization and different probabilistic initial conditions, which is different than producing ensemble members using ensemble forcing data into a single hydrologic model and the same initial conditions.

The HBV model produces both streamflow and SWE forecast outputs. In this study, HBV-based SWE values were converted to SCA. The data assimilation implementation also accounted for the assimilation of both observations (streamflow (Q) and SCA). Therefore, the outputs (HBV-Q and HBV-SCA) are also compared with the observation (observed streamflow and satellite SCA) using verification metrics. The study was conducted during the 2009–2012 Water Years (WYs), which are the 12 months between 1 October and 30 September in Türkiye.

2.1. Study Area and Hydro-Meteorological Data

The developed methodology was tested in the Karasu River catchment, a major branch of the Euphrates River in Türkiye. The study area and station network are shown in Figure 2. The basin with a drainage area of 10,275 km² has an elevation range from 1125 m to 3500 m and a hypsometric mean elevation of 1983 m. The main land cover types are pasture, cultivated, and bare land. Snowmelt streamflow in the mountainous eastern part of Türkiye is of great importance. It constitutes approximately two-thirds of the volume of the total yearly streamflow during the spring and early summer months. The streamflow data were collected at the outlet of the basin (2119 station). Karasu Basin is divided into 10 elevation zones (bands) between 1125–3500 m. A total of 18 climatologic and automated weather operating stations (AWOS), ranging in elevation between 981 and 2937 m, are located in and around the basin (Figure 2). The required input data for the HBV model are daily total Precipitation (P), daily average Temperature (T), and daily Potential Evaporation (PE). In this study, input data are provided as zone-based areal values calculated by the Detrended Kriging distribution method [39].

The box and whisker plots of the hydro-climatic data (total annual P, average annual T and the mean annual streamflow) are given in Figure 3. In summary, the overall basin statistics are observed for the mean total P, and the mean annual temperature and the mean annual streamflow are 638 mm/year, 5.1 °C, 86 m³/s, respectively.

2.2. Snow Cover Area (SCA) Data

In snowmelt streamflow modeling, it is essential to either use snow data as input to the model or validate the model output with observations. In this study, snow data were calculated from Moderate Resolution Imaging Spectroradiometer (MODIS) observations. The MODIS snow-mapping algorithm is fully automated and is based on the Normalized Difference Snow Index (NDSI) in a set of thresholds [40]. However, optical satellites like MODIS are suffering from cloud coverage, especially during the snow season, resulting in missing areal SCA data for cloudy days. In this study, cloud-free SCA data are provided by certain filters such as combination, temporal, spatial, and elevation [41]. Therefore, filtering enables continuous cloud-free observations of MODIS SCA data over the Karasu catchment.

Observed daily SCA data were converted to daily time series for each elevational zone for the study period. HBV model produces SWE, converts it to SCA, and utilizes the observed SCA information for the data assimilation part. Figure 4 depicts the variation of SCA of the whole catchment and streamflow at the outlet of the basin for the 2009–2012 WY period. During the accumulation period, the streamflow values are low while the major portion comes during and after the start of the melting season. Besides, remote sensing snow products (MODIS, IMS, MSG-SEVIRI) provide above 90% accuracy with ground measurements in the region [41,42,43,44].

2.3. HBV Hydrological Model

HBV model is a bucket-type, semi-distributed conceptual model to employ the hydrological processes by dividing a hydrological basin into sub-basins and/or elevation ranges [45]. The model generally consists of three main routines as: snow, soil moisture and runoff response. HBV simulates the rainfall-runoff relationship using the degree-day method in mountainous regions. Our dedicated implementation of the HBV model is documented in Schwanenberg and Bernhard [46] and refers to Bergström [47] and Lindström et al. [48]. For this study, the variational data assimilation (VarDA) approach is based on the Moving Horizon Estimation (MHE) formulation defined in [29]. Therein, the hydrological model can be described in a time discretization according to:

x^{k} = f (x^{k - 1}, u^{k}, d^{k}, p)

(1)

y^{k} = g (x^{k}, v^{k}, d^{k}, p)

(2)

where

x

,

y

, and

d

are the state, output, and external forcing vectors, respectively,

u

,

v

are noise terms,

p

is the model parameters (see Abbreviation for the definition of each parameter) vector,

f ()

and

g ()

are functions representing arbitrary linear or non-linear components of the model, and

k

is the time step index.

The model takes daily total P, daily average T and PE as main inputs and produces modelled outflow. In the HBV model application structure, the basin is divided into ten elevation zones with equal areas. The accumulated streamflow of these elevation zones contributes to modeled outflow, which is compared to the measured streamflow at the outlet streamflow gaging station of the basin (2119 station). The model was chosen to operate on an open area land use along with daily time steps. The model also produces SWE as the model state. A simple bilinear function converts the SWE computed by the model to SCA. Modelled SCA is also compared with observed SCA provided by MODIS snow product.

The model has 21 parameters, including the Muskingum routing parameter, and was calibrated using the interior point algorithm Ipopt [49]. The calibration was done for the water years 2002–2008, resulting in a Nash–Sutcliffe Efficiency (NSE) performance of 0.84. The calibrated parameters were validated for 2009–2012, which resulted in a NSE of 0.70.

2.4. Deterministic and Multi-Parametric Variational Data Assimilation

A deterministic VarDA approach was conducted for a forecast time

T^{k} = 0

over an assimilation period

k = [- N + 1, 0]

of

N \geq 1

time steps by the optimization of an objective function

J (u, v)

according to Equation (3) and was subject to Equations (4) and (5):

\begin{matrix} m i n \\ u, v \end{matrix} \sum_{k = - N + 1}^{0} (w_{x} ‖ {\hat{x}}^{k} - x^{k} (u) ‖ + w_{y} ‖ {\hat{y}}^{k} - y^{k} (u, v) ‖ + w_{u} ‖ u^{k} ‖ + w_{v} ‖ v^{k} ‖)

(3)

u_{L} \leq u^{k} \leq u_{U}

(4)

v_{L} \leq v^{k} \leq v_{U}

(5)

where

\hat{x}

and

\hat{y}

are observations of the state and the dependent variable vectors, respectively,

w_{x}

,

w_{y}

,

w_{v}

are weighting coefficients to define the trade-off between different penalties,

‖.‖

is a suitable norm penalizing the deviation between observed and simulated quantities as well as the amount of noise introduced by the data assimilation procedure. Note that the optimization setup modifies the noise variables

u^{k}

and

v^{k}

to find the minimum value of the objective function. Furthermore, the noise terms get bounded by inequality constraints which consider constant lower and upper bounds during the complete assimilation period.

Multi-parametric variational data assimilation, MP-VarDA, is an extended version of deterministic Moving Horizon Estimation-based VarDA described above in Equations (3)–(5). The main difference is that MP-VarDA assimilates information into a pool of

M

number of model instances, according to Equation (6), and subject to Equations (7) and (8). In this study, the probability

p_{m}

for each model is defined as equally likely. The amount of noise introduced to the system is equally distributed among all

M

models.

\begin{matrix} m i n \\ u, v \end{matrix} \sum_{m = 1}^{M} (p_{m} \sum_{k = - N + 1}^{0} (w_{x} ‖ {\hat{x}}^{k} - x^{k, m} (u) ‖ + w_{y} ‖ {\hat{y}}^{k} - y^{k, m} (u, v) ‖ + w_{u} ‖ u^{k} ‖ + w_{v} ‖ v^{k} ‖))

(6)

u_{L} \leq u^{k} \leq u_{U}

(7)

v_{L} \leq v^{k} \leq v_{U}

(8)

The Monte Carlo simulation technique was used to create multiple model parameters based on a previously calibrated hydrological model. The idea is that very different parameter sets can yield almost equally good fits between simulated and observed streamflow. The simulations are bounded by a multiplicative factor concerning the calibrated set as well as a physical limit such that:

{- p}_{i, c a l} * F_{v a r} \leq p_{i, m} \leq p_{i, c a l} * F_{v a r}

(9)

p_{i, L} \leq p_{i, m} \leq p_{i, U}

(10)

where

p_{i, c a l}

is the parameter

i

of the calibrated set,

F_{v a r}

is a variation factor, and

p_{i, m}

is the modified parameter bounded to the lower and upper bounds

p_{i, L}

and

p_{i, U}

respectively. The parameters

p_{i, m}

were generated using the Monte Carlo method.

The multiple model parameters include a large set, but it is not practical to implement all of them in MP-VarDA due to its computational expense. Therefore, we aimed at producing a representative set of models (model pool) from the Monte Carlo simulation. Here, we use the AD method [29] to reduce the number of parameter sets while still keeping the maximum amount of parametric uncertainty in the remaining sets. Model pool(s) were established from the same multiple parameter set (Monte Carlo simulation runs only once in the entire study), but each pool has varying model instances. The study analyzes the performance of different model pools in MP-VarDA. The model pool with more instances always includes a model pool with fewer instances. For example, a model pool having 5 instances includes model pools 3 and 1. This guarantees the inclusion of the same model uncertainty captured in different larger model pools. On the other hand, the model pool having only one model instance is considered to be representative of VarDA to compare with MP-VarDA results and is similarly created from multiple parameter sets.

Since the variational methods use optimization algorithms to minimize an objective function, the trade-off is done between the amount of noise introduced into the model and the distance between simulated and observed variables. Unlike a single VarDA, MP-VarDA optimizes the function using multiple model instances. Streamflow and snow (satellite SCA) observation data were assimilated using noise terms of states and forcings such as precipitation and temperature, as indicated in Table 1. The other states (SM, UZ, and LZ) of the HBV model are also accounted for in the objective function. The probabilistic initial values are produced by MP-VarDA, and ensemble forecasts are produced with probabilistic initial states and multi-model structure. The range of noise terms and weight of the observations are given in Table 1. This configuration is compared with other alternatives and the optimum weights were assessed based on forecast gains in Alvarado-Montero et al. [28]. The setup assimilates streamflow and snow data; thus, it claims to improve both forecast outputs. Moreover, we kept the assimilation window in both applications up to 6 months to capture a larger time lag between forcings and observations.

2.5. Aggregated Distance (AD) Method

The produced multiple parameter sets by Monte Carlo simulation should be reduced to representative members. There are different reduction approaches (fast forward, random selection, tree-based, etc.) in the literature. The selected algorithm plays a key role since it should capture a significant part of parametric uncertainty. AD is a robust method because it is a distance-based method, so the reduction is subjected to eliminating the parameter set m with the minimum aggregated distances to the rest of the parameter sets. The probability of parameter set m is then reassigned to its closest set. This procedure is repeated until the chosen number of parameter sets R is reached after N − R iterations, where N is the initial number of parameter sets. R is also selected by the user. In this study, the effects of selection R are also assessed, and results are compared. The details of the AD method are described in Alvarado-Montero [50].

2.6. Monte Carlo Configuration and Model Pool Generation

The VarDA configuration is based on a set of calibrated HBV hydrological model parameters (single model pool). The multiple model parameters p_i,m are generated using the Monte Carlo method based on normal distribution perturbation around the previously calibrated parameters. To that end, F_var (Equation (9)) is set to 30% and the number of the simulations are selected as 1000 under the Monte Carlo method. The parameter sets (mpi1000) produced by the Monte Carlo method were subjected to the performance threshold (with model performance above 98% of the calibrated model). Among them, only 800 sets (mpi800) satisfies this threshold. It is not practical to account 800 mpi in the MP-VarDA method, thus the parameters (mpi800) are reduced to mpi1, mpi3, mpi5, and mpi10 using the AD reduction technique. After reduction, each mpi is implemented in a hindcasting (closed-loop) mode and represents an experiment, as given in Table 2. The benchmark NoDA and deterministic VarDA stand for mpi1, whereas the others (mpi3, mpi5, mpi10) represent probabilistic data assimilation under MP-VarDA configuration.

2.7. Performance Metrics

The study tests the close-loop performances of the forecast system. While the deterministic method (VarDA) based forecasts are single trajectories, the probabilistic data assimilation method (MP-VarDA)-based forecasts have multiple members (ensemble). The number of the ensemble is equal to the number of mpi, i.e., probDA-mpi3 produces three different forecasts in each time step up to the lead time. Therefore, the performance metrics in this study are commonly used statistics in ensemble forecast systems and rely on the Ensemble Verification System (EVS) 5.4 developed by the National Oceanic and Atmospheric Administration, NOAA [51]. The exceedance probabilities are first calculated for Pr = 0.25, Pr = 0.50, and Pr = 0.75 of the observed data set. Following that, these probabilities were set as the thresholds, and forecasted streamflow and SCA were separately assessed over these thresholds.

The Brier Score (BS) in Equation (11) measures the average square error of a probability forecast. It is equivalent to the mean square error of a deterministic forecast, but the forecasts, and hence error units, are given in probabilities. For an event that involves not exceeding some threshold,

t

, the BS is computed from the forecast probability,

F_{Y} (t)

, and the corresponding observed outcome,

x

, whose cumulative probability is 1 if t is exceeded by the observation and 0 otherwise, as defined by the step function,

1 \{.\}

.

B S (t) = \frac{1}{n} \sum_{k = 1}^{n} {(F_{Y_{k}} (t) - 1 \{t \geq x_{k}\})}^{2}

(11)

The Brier Skill Score (BSS) in Equation (12) measures the performance of one forecasting system relative to another in terms of the Brier Score (BS). The BSS comprises a ratio of the BS for the forecasting system to be evaluated (the “main forecasting system”), BS_MAIN, over the BS for the reference forecasting system, BS_REF. BSS closer to 1 is preferred.

B S S = 1 - \frac{{B S}_{M A I N}}{{B S}_{R E F}}

(12)

On the other hand, data assimilation accuracy was also attained using mean Continuous Ranked Probability Score (m-CRPS), so Continuous Ranked Probability Score (CRPS) was calculated according to Equation (13). The m-CRPS metric is equivalent to the mean absolute error when using deterministic forecasts. It provides information on the accuracy of the forecast.

{C R P S}_{L} = \frac{1}{n} \sum_{k = 1}^{n} [\int_{- i n f}^{i n f} {(F_{t} (y_{k, L}) - Γ (y_{k, L} \geq {\hat{y}}_{k}))}^{2} d y]

(13)

where:

y_{k, L}

represents the value of the forecast

k - L

with a leadtime L,

k

is the indicator of the forecast,

n

is the number of ensembles,

{\hat{y}}_{k}

is the mean of the ensemble forecast,

F ()

is the cumulative distribution function, and

Γ

is a function that assumes probability 1 for values higher or equal to the observation and 0 otherwise.

3. Results

3.1. Monte-Carlo Simulation and Model Pool Instance Results

Only around 800 of the total 1000 model instances that had different parametrization were selected, considering performance threshold (with model performance above 98% of the calibrated model). This ensures the reliability of the model sets used in data assimilation application. Figure 5 demonstrates the distribution of parameters for each mpi (with box and whisker plots) produced by the Monte Carlo method (both for mpi800 and AD-based reduced mpi). For example, the spread of mpi800 has the highest variation due to its capability of multi-modeling ability. However, the idea here is to demonstrate the variation of the deterministic (mpi1) and reduced probabilistic sets (mpi3, mpi5, and mpi10) in comparison with mother pool (mpi800). The upper and lower y-axis limits of the graph are adjusted according to the physical limits of the parameters. Table 3 shows the ranges of the mother pool set categorized into three groups depending on the visual parameter comparison in Figure 5. Accordingly, only 4 parameters have low-range and 3 parameters are within mid-range, while the remaining total 14 parameters are found to provide a good fit, with the observed streamflow having a wide range.

The Monte Carlo method can capture a more extensive distribution by having larger parameter sets. Since mpi1 has a single member parameter set, it provides a single value in the graph (Figure 5). Thus, the parameters of mpi1 do not always match the mean of the mother pool (mpi800). However, it can be stated that AD-based model pools show promising distribution, even with fewer model instances: correction factor for evaporation (ECORR), correction factor for rainfall (RFCF), temperature interval with a mixture of snow/rain (TTI), degree day factor (CFMAX). Additionally, increasing the number of the model pool can properly represent a larger spread; see temperature limit for snow (rain or snow) (TT), recession coefficient (ALPHA), and field capacity (FC).

Even the small number of AD-based parametrization can rapidly converge to the mother model pool ranges. For example, this is observed for the parameters of K1, CFMAX, or BETA, where the 1 mpi value is close to the minimum or maximum value of the mother pool, while 3 mpi can rapidly capture the mother pool range. Outliers are only detected in Muskingum routing parameters (k_CON and eps_CON) for the 10 mpi.

3.2. Data Assimilation Hindcast Results

The hindcast period starts from 1 October 2008 and ends on 30 September 2012, and has daily time steps. The forecast lead time is set to 10 days (0 to 240 h with 24 h increments). The initial conditions were improved by assimilating both streamflow and snow data under deterministic and probabilistic variational DA. Improved states were utilized as the best initial condition to generate perfect forecasts based on observed forcing data. This also eliminates forecast uncertainty, which is out of the scope of this study. Generated model pool parameter sets (mpi1, mpi3, mpi5, and mpi10) were implemented into data assimilation through the HBV hydrological model in a hindcasting mode. The configuration intended to produce both streamflow and SCA forecasts. Since VarDA is deterministic, it produces a single forecast for each time step up to the lead time, whereas MP-VarDA produces an ensemble set of forecasts considering multiple initial conditions and model parametrization.

Figure 6 demonstrates the multi-model ensemble forecasts of the probDA-mpi10 experiment as an example case from the 2011 WY by mpi10 parameterization and MP-VarDA application coupled in HBV. The dash lines indicate the streamflow and SCA observations. The orange and blue lines are produced streamflow and SCA forecasts that have 10 members for up to 10-day lead time in daily time step, respectively.

The streamflow thresholds (depending on the exceedance probabilities; Pr = 0.25, Pr = 0.50, Pr = 0.75) from the observations are calculated as 39.5 m³/s, 49.4 m³/s, and 100.4 m³/s, respectively. The SCA thresholds (Pr = 0.25, Pr = 0.50, Pr = 0.75) from satellite observations are calculated as 0.01%, 6.23%, and 69.47%. Thresholds are lower limits for the assessments. Therefore, the forecast members above Pr = 0.25, Pr = 0.50, and Pr = 0.75 for streamflows are deemed as “above low flow”, “above median flow”, and “high flow”, respectively. On the other hand, Pr = 0.25 for SCA encompasses any snow occurrences by excluding summer months, while Pr = 0.50 threshold stands for the accumulation and melting periods, and finally Pr = 0.75 only evaluates the high and full snowy seasons.

Figure 7 and Figure 8 compare the forecast performance using m-CRPS and BSS metrics for VarDA and MP-VarDA experiments, respectively. The performances of the noDA-mpi1 experiment are also shown in the related figures. Since m-CRPS has the same unit as the observations, it provides useful information about the assessment. The m-CRPS of the deterministic experiments are identical to the mean absolute error. The ranges for the streamflow data have wider bounds, and thus the graphs are given in different y-axis scales for m-CRPS, while SCA is kept the same. The m-CRPS of a perfect forecast would be zero, while the BSS for a seamless set of forecasts would be one. Since BSS is always based on a threshold value, the presentations are only available for exceedance probability thresholds, while m-CRPS can be calculated for both all data and forecasts above these thresholds.

The noDA-mpi1 experiment always has the highest error and lowest forecast skill, except for the BSS of the forecasts above Pr = 0.50 in the probDA-10mpi experiment. The data assimilation application seems to improve the performance of hydrological forecasting for both streamflow and snow components. There is a neat order in the data assimilation experiments for the streamflow, which means that m-CRPS values decrease by increasing the number of the model pool in MP-VarDA. According to the snow data graphs, deterministic VarDA gives either less or similar m-CRPS values in comparison to probabilistic MP-VarDA. The order of the error lines is not as representative as in the case of streamflow graphs. Nevertheless, a notable difference can only be stated in the high SCA forecast (>Pr = 0.75), where probDA-mpi5 gives the lowest m-CRPS. On the other hand, BSS skill performance is decreasing over time. The forecast skill of the noDA-mpi1 in streamflow shows that all data have a negative value, indicating the very poor performance of the system without data assimilation implementation. The highest skills are detected for the forecasts above Pr = 0.75 for both streamflow and SCA so that the forecast skills of the high-flows/SCA are better than other forecast groups (>Pr = 0.25 and >Pr = 0.50). The worst skill is detected for the forecasts above Pr = 0.50 in SCA. The lower skills are expected for the forecast above Pr = 0.25.

4. Discussion

The results are analyzed with percent gains of the data assimilation methods to find the most effective method in the hydrological forecasting system. Figure 9 compares m-CRPS based error gain (%) for VarDA and MP-VarDA experiments with respect to noDA-mpi1 benchmark simulation. Higher error gains are detected in the MP-VarDA against deterministic VarDA, especially for streamflow forecasts. The error gains intuitively decrease with respect to lead time, but still stay within a considerable level (34% and 50% for detDA-mpi1 and probDA-mpi10 in all data analyses, respectively). Increasing the number of mpi clearly increases the gains. This situation was observed for all data, as well as in different forecast groups above the thresholds. Therefore, it can be stated that MP-VarDA outperforms VarDA for streamflow forecasts in the case of assimilating streamflow and SCA. There is a limit where increasing model pool instances does not contribute further. For example, probDA-mpi5 and probDA-mpi10 seem to give similar results. Compared to streamflow, lower m-CRPS error gains in SCA were detected for both methods. Additionally, VarDA gives a higher gain for all data and the forecasts greater than Pr = 0.25 and Pr = 0.50, while MP-VarDA slightly increases the gains of the forecasts which are greater than Pr = 0.75, especially for probDA-mpi5. This is also detected for longer lead times. However, it is not straightforward to classify which method or mpi gives more accurate results in a general sense.

Figure 10 compares the percent forecast BSS-gain of VarDA and MP-VarDA experiments with respect to noDA-mpi1 benchmark simulation. Increasing the mpi in MP-VarDA also enhances forecast skills gains. The gains in streamflow increase above 100% for Pr = 0.25 because of the negative (very poor) BSS performance of the noDA-mpi1 simulation. The BSS gains look similar for the streamflow forecasts above the other thresholds (Pr = 0.50 and Pr = 0.75). The main improvement by MP-VarDA is detected especially for the SCA forecasts above low and high thresholds (Pr = 0.25 and Pr = 0.75). This is also due to the uncertainty capture capacity of the probabilistic case where a larger spread is accounted for in comparison to a single deterministic forecast.

Even though m-CRPS error gains are not increasing as aforementioned, the BSS forecast skill gains seem to be the same or slightly increased in MP-VarDA compared to VarDA. Similarly, the highest BSS gains are detected for the SCA forecasts above Pr = 0.25 following Pr = 0.75. The lowest gain was observed for the SCA forecasts above Pr = 0.50.

As a summary, the mean gain of the whole lead time period (0 to 240 h) was calculated together with their standard deviations for m-CRPS and BSS in Table 4 and Table 5, respectively. The mean of the gains for m-CRPS and BSS increase while standard deviations decrease in streamflow from deterministic to probabilistic data assimilation implementation. The highest streamflow m-CRPS mean gain from all experiments within all data and the forecasts above the three thresholds are detected as 67–69% for MP-VarDA, having 5 or 10 mpi. The highest streamflow BSS mean gain of all experiments for all data and within the three thresholds are detected as 63–81% for MP-VarDA, having 5 or 10 mpi. In this study, we kept the variations (F_var) in the Monte Carlo method at 30%, which gives the highest performance according to Alvarado et al. [29]. The improvements in streamflow forecasts are similar in comparison to their results. The main difference is that the number of mpi is assessed for the first time in this study, and it is found that increasing the number of mpi significantly improves the model performances until a threshold mpi, especially for streamflow metrics. Thus, increasing the model pool seems to converge to a limit where additional gains do not make a significant contribution.

The SCA gains are not appreciated as much as the streamflow, but the models seem to be more robust since they account for the snow component, which is the main driver of the hydrological process. The highest SCA m-CRPS mean gain from all experiments considering all data and the forecasts above the first two thresholds (Pr = 0.25 and Pr = 0.50) are detected as 31–36% from the deterministic VarDA method (1 mpi). However, the maximum SCA m-CRPS gain is calculated as 68% for the forecasts above the upper threshold (Pr = 0.75) in one of the probabilistic experiments (probDA-mpi5). BSS gains of SCA have a different (i.e., not regular) order compared to streamflow. The highest SCA BSS mean gains of all experiments are 12%, 4%, and 47% for probDA-mpi5, VarDA, and probDA-mpi3, respectively. Therefore, the improvement in streamflow seems to decrease SCA performances in MP-VarDA implementation.

Another reason for the variations in SCA performances might be attributed to the assimilation of snow states where SCA is indirectly calculated in the hydrological model. On the other hand, the reduction in AD might affect the performance of SCA in data assimilation application. For example, probDA-mpi5 shows the lowest m-CRPS SCA gain for all thresholds except for Pr = 0.75 (stands for the high SCA values) and this gain is the highest among the others. This could be related to the representation of CFMAX parameter distribution (refer to Figure 5) of mpi5 among the other model pools and the lower capability of initial state updating of the forecasting during more complex accumulation and melting seasons (refer to Figure 6 which appears to present lower representation of accumulation or melting seasons with overestimations). We should also keep in mind that CFMAX is a sensitive parameter, which means that a little variation of this parameter could result in large changes especially in SWE, and thus in SCA.

Since a low CFMAX parameter (degree-day factor) causes higher SCA preservation in the model itself (due to slower melting), the overestimation of SCA can result in lower m-CRPS gains in the forecasts above Pr = 0.25 and Pr = 50, but a higher gain for the forecasts above Pr = 0.75 threshold. Additionally, streamflow is dominated by 21 parameters, while only a few of them are directly associated with snow components. This makes the estimation of SWE and indirectly SCA estimation more difficult. On the other hand, BSS gain variation in SCA indicates the complexity of ensemble spread for SCA in the probabilistic method, thus yielding variable best forecast skills. It is also important to keep in mind that SCA still includes uncertainties associated with the temporal and spatial representation of the snow storage.

5. Conclusions

The data assimilation application is a useful technique for a reliable hydrological forecasting system. Integration of data assimilation is especially more challenging in snow-dominated regions due to the complexity of snow processes and limited observations. This study analyzes the forecast performance of multi-parametric probabilistic data assimilation (MP-VarDA) coupled in HBV hydrological model. Streamflow and SCA forecasts up to 10 days are generated using a benchmark simulation without data assimilation application, a deterministic model based on VarDA method, and several probabilistic models based on MP-VarDA method. Since MP-VarDA requires multi-parameterization, Monte Carlo simulation was applied to derive multiple sets of model parameters; however, it is not practical to account for all of them in a real-time forecasting system. In order to make a fair judgment, here we used the same mother model pool to determine both deterministic and probabilistic parameters. To capture model uncertainty without increasing the number of model parameter sets, AD method shows robust representativeness by keeping the maximum amount of parametric uncertainty in the remaining sets. The utilization of AD method in MP-VarDA also enables us to integrate probabilistic methods in the real-time forecasting system more efficiently since probabilistic data assimilation has a higher computational cost relative to deterministic counterpart. The study also reveals the effects of different model instances integrated with a probabilistic assimilation approach. The data assimilation method can incorporate the model pool and improve the hydrological model’s initial states.

In conclusion, we have summarized certain take-home messages and future recommendations below:

The results show the supremacy of MP-VarDA against deterministic VarDA, especially for streamflow forecasts. MP-VarDA reduces the forecast error (m-CRPS) and provides larger forecast skill (BSS) by having probabilistic initial states and ensemble forecasts based on the multi-modeling scheme. The method gives more robust results for different forecast groups above certain threshold values selected from exceedance probabilities. These results should encourage operators to adopt pools with multiple model instances to consider model uncertainty in their forecast estimation.
Considering the data limitation in the upper snow dominated basins, most of the studies in the literature rely on the conceptual semi-distributed approaches. The study was also tested with a single rainfall-runoff relationship through the HBV hydrological model. Our approach takes the advantage of novel assimilation methods and enriching the method with probabilistic states to set up a more reliable forecasting system. The method was also tested in a closed-loop mode which mimics the real-time hydrological forecasting system, and the study demonstrates for forecasters/practitioners that MP-VarDA method can better capture structural uncertainty and improves initial states even using a single hydrological modelling environment.
Assimilating snow with streamflow through hydrological modeling improves both deterministic and probabilistic data assimilation methods. However, considering the complexities of snow accumulation and melting processes, we cannot propose a single best method for SCA forecasts.
We also revealed the impact of selecting different sizes of model pools in a forecast system for a snow-dominated area. Increasing the number of the model pool has a clear impact up to a certain limit on the performance of streamflow forecasts. On the other hand, SCA forecasts are mainly dominated by fewer and more sensitive parameter sets and increments of mpi in MP-VarDA configuration might still increase the forecast performances by capturing better sampling of snow modelling representation.
Future studies should assess and compare these results under numerical weather prediction systems rather than observed meteorological inputs. Even though SCA-SWE is represented by a bilinear depletion function, the more realistic way could be established by a hysteresis. Additionally, in case of having satellite-based SWE, the assimilation can be employed directly by SWE observations so that uncertainties related to the SCA-SWE transition could be reduced.

Author Contributions

Conceptualization, G.U., A.Ş. and R.A.-M.; methodology, R.A.-M.; data assimilation hindcast simulations, G.U.; snow depletion analysis, A.A.Ş.; writing—original draft preparation, G.U.; writing—review and editing, A.Ş., R.A.-M. and A.A.Ş.; visualization, G.U.; funding acquisition, G.U. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the European Cooperation in Science and Technology (COST) Harmosnow ES1404 Action under Short Term Scientific Mission (STSM).

Data Availability Statement

Restrictions apply to the availability of these data. Data was obtained from [the State Hydraulics Works of Türkiye (DSI) as well as the Turkish State Meteorological Service (MGM)] and are available [from the authors] with the permission of [the State Hydraulics Works of Türkiye (DSI) as well as the Turkish State Meteorological Service (MGM)].

Acknowledgments

Special thanks are extended to the State Hydraulics Works of Türkiye (DSI) as well as the Turkish State Meteorological Service (MGM).

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviation

Parameter Abbreviation	Unit	Parameter Name
ECORR	----	Correction factor for evaporation
RFCF	----	Correction factor for rainfall
SFCF	----	Correction factor for snowfall
TT	°C	Temperature limit for snow (rain or snow)
TTI	°C	Temperature interval with a mixture of snow/rain
CFMAX	mm/day °C	Degree day factor
CFR	----	Refreezing factor
CWH	%	Water holding capacity
TM	°C	Temperature limit for melting
BETA	----	Exponent in formula for drainage from soil
CFLUX	mm/day °C	Maximum capillary flow from upper box to soil moisture routine
ETF	----	Temperature correction factor
FC	mm	Field capacity
LP	----	Limit for potential evapotranspration
ALPHA	----	Recession coefficient in Q = (k)(UZ)^(alfa+1)
K	1/day	Recession coefficient
K1	1/day	Recession coefficient for lower box
PERC	mm/day	Percolation capacity from upper to lower box
LIC	----	Maximum interception storage
k_CON	----	Muskingum routing parameter
eps_CON	----	Muskingum routing parameter

References

Barnett, T.P.; Adam, J.C.; Lettenmaier, D.P. Potential impacts of a warming climate on water availability in snow-dominated regions. Nature 2005, 438, 303. [Google Scholar] [CrossRef] [PubMed]
Xu, J.; Anctil, F.; Boucher, M.A. Hydrological post-processing of streamflow forecasts issued from multimodel ensemble prediction systems. J. Hydrol. 2019, 578, 124002. [Google Scholar] [CrossRef]
Schneiderbauer, S.; Prokop, A. The atmospheric snow-transport model: SnowDrift3D. J. Glaciol. 2011, 57, 526–542. [Google Scholar] [CrossRef] [Green Version]
Zhao, Q.; Liu, Z.; Ye, B.; Qin, Y.; Wei, Z.; Fang, S.A. Snowmelt runoff forecasting model coupling WRF and DHSVM. Hydrol. Earth Syst. Sci. 2009, 13, 1897–1906. [Google Scholar] [CrossRef] [Green Version]
Zhang, L.; Li, X.; Zheng, D.; Zhang, K.; Ma, Q.; Zhao, Y.; Ge, Y. Merging multiple satellite-based precipitation products and gauge observations using a novel double machine learning approach. J. Hydrol. 2021, 594, 125969. [Google Scholar] [CrossRef]
Boucher, M.A.; Quilty, J.; Adamowski, J. Data assimilation for streamflow forecasting using extreme learning machines and multilayer perceptrons. Water Resour. Res. 2020, 56, e2019WR026226. [Google Scholar] [CrossRef]
Helmert, J.; Şensoy Şorman, A.; Alvarado Montero, R.; De Michele, C.; De Rosnay, P.; Dumont, M.; Finger, D.C.; Lange, M.; Picard, G.; Potopová, V.; et al. Review of snow data assimilation methods for hydrological, land surface, meteorological and climate models: Results from a cost harmosnow survey. Geosciences 2018, 8, 489. [Google Scholar] [CrossRef] [Green Version]
Adnan, M.; Nabi, G.; Poomee, M.S.; Ashraf, A. Snowmelt runoff prediction under changing climate in the Himalayan cryosphere: A case of Gilgit River Basin. Geosci. Front. 2017, 8, 941–949. [Google Scholar] [CrossRef] [Green Version]
Uysal, G.; Şensoy, A.; Şorman, A.A. Improving daily streamflow forecasts in mountainous uuphratesrates basin by multi-layer perceptron model with satellite snow products. J. Hydrol. 2016, 543, 630–650. [Google Scholar] [CrossRef]
Şensoy, A.; Uysal, G. The value of snow depletion forecasting methods towards operational snowmelt runoff estimation using MODIS and numerical weather prediction data. Water Resour. Manag. 2012, 26, 3415–3440. [Google Scholar] [CrossRef]
Şensoy, A.; Uysal, G.; Şorman, A.A. Assessment of H SAF satellite snow products in hydrological applications over the Upper Euphrates Basin. Theor. Appl. Climatol. 2023, 151, 535–551. [Google Scholar] [CrossRef]
Siemens, K.; Dibike, Y.; Shrestha, R.R.; Prowse, T. Runoff projection from an alpine watershed in western Canada: Application of a snowmelt runoff model. Water 2021, 13, 1199. [Google Scholar] [CrossRef]
Girons Lopez, M.; Vis, M.J.; Jenicek, M.; Griessinger, N.; Seibert, J. Assessing the degree of detail of temperature-based snow routines for runoff modelling in mountainous areas in central Europe. Hydrol. Earth Syst. Sci. 2020, 24, 4441–4461. [Google Scholar] [CrossRef]
Steele, C.; Dialesandro, J.; James, D.; Elias, E.; Rango, A.; Bleiweiss, M. Evaluating MODIS snow products for modelling snowmelt runoff: Case study of the Rio Grande headwaters. Int. J. Appl. Earth Obs. Geoinf. 2017, 63, 234–243. [Google Scholar] [CrossRef]
Seibert, J.; Vis, M.J. Teaching hydrological modeling with a user-friendly catchment-runoff-model software package. Hydrol. Earth Syst. Sci. 2012, 16, 3315–3325. [Google Scholar] [CrossRef] [Green Version]
Pu, Z.; Xu, L.; Salomonson, V.V. MODIS/Terra observed seasonal variations of snow cover over the Tibetan Plateau. Geophys. Res. Lett. 2007, 34, L06706. [Google Scholar] [CrossRef] [Green Version]
Piazzi, G.; Tanis, C.M.; Kuter, S.; Simsek, B.; Puca, S.; Toniazzo, A.; Takala, M.; Akyürek, Z.; Gabellani, S.; Arslan, A.N. Cross-country assessment of H-SAF snow products by Sentinel-2 imagery validated against in-situ observations and webcam photography. Geosciences 2019, 9, 129. [Google Scholar] [CrossRef] [Green Version]
Andreadis, K.M.; Lettenmaier, D.P. Assimilating remotely sensed snow observations into a macroscale hydrology model. Adv. Water Resour. 2006, 29, 872–886. [Google Scholar] [CrossRef]
Roulin, E.; Vannitsem, S. Post-processing of medium-range probabilistic hydrological forecasting: Impact of forcing, initial conditions and model errors. Hydrol. Process. 2015, 29, 1434–1449. [Google Scholar] [CrossRef]
Salamon, P.; Feyen, L. Disentangling uncertainties in distributed hydrological modeling using multiplicative error models and sequential data assimilation. Water Resour. Res. 2010, 46, W12501. [Google Scholar] [CrossRef] [Green Version]
Kavetski, D.; Kuczera, G.; Franks, S.W. Bayesian analysis of input uncertainty in hydrological modeling: 1. Theory. Water Resour. Res. 2006, 42, W03407. [Google Scholar] [CrossRef]
Şorman, A.A.; Uysal, G.; Şensoy, A. Probabilistic snow cover and ensemble streamflow estimations in the Upper Euphrates Basin. J. Hydrol. Hydromech. 2019, 67, 82–92. [Google Scholar] [CrossRef] [Green Version]
Rafieeinasab, A.; Seo, D.J.; Lee, H.; Kim, S. Comparative evaluation of maximum likelihood ensemble filter and ensemble Kalman filter for real-time assimilation of streamflow data into operational hydrologic models. J. Hydrol. 2014, 519, 2663–2675. [Google Scholar] [CrossRef]
Lee, H.; Seo, D.J.; Liu, Y.; Koren, V.; McKee, P.; Corby, R. Variational assimilation of streamflow into operational distributed hydrologic models: Effect of spatiotemporal scale of adjustment. Hydrol. Earth Syst. Sci. 2012, 16, 2233–2251. [Google Scholar] [CrossRef] [Green Version]
Lee, H.; Seo, D.J.; Koren, V. Assimilation of streamflow and in situ soil moisture data into operational distributed hydrologic models: Effects of uncertainties in the data and initial model soil moisture states. Adv. Water Resour. 2011, 34, 1597–1615. [Google Scholar] [CrossRef]
Aubert, D.; Loumagne, C.; Oudin, L. Sequential assimilation of soil moisture and streamflow data in a conceptual rainfall–runoff model. J. Hydrol. 2003, 280, 145–161. [Google Scholar] [CrossRef]
Parajka, J.; Blöschl, G. Spatio-temporal combination of MODIS images–potential for snow cover mapping. Water Resour. Res. 2008, 44, W03406. [Google Scholar] [CrossRef]
Alvarado-Montero, R.; Uysal, G.; Collados-Lara, A.J.; Şorman, A.A.; Pulido-Velázquez, D.; Şensoy, A. Comparison of sequential and variational assimilation methods to improve hydrological predictions in snow dominated mountainous catchments. J. Hydrol. 2022, 612 Pt A, 127981. [Google Scholar] [CrossRef]
Alvarado-Montero, R.; Schwanenberg, D.; Krahe, P.; Helmke, P.; Klein, B. Multi-parametric variational data assimilation for hydrological forecasting. Adv. Water Resour. 2017, 110, 182–192. [Google Scholar] [CrossRef]
Saloranta, T.M. Operational snow mapping with simplified data assimilation using the seNorge snow model. J. Hydrol. 2016, 538, 314–325. [Google Scholar] [CrossRef]
Griessinger, N.; Seibert, J.; Magnusson, J.; Jonas, T. Assessing the benefit of snow data assimilation for runoff modeling in Alpine catchments. Hydrol. Earth Syst. Sci. 2016, 20, 3895–3905. [Google Scholar] [CrossRef] [Green Version]
Huang, C.; Newman, A.J.; Clark, M.P.; Wood, A.W.; Zheng, X. Evaluation of snow data assimilation using the ensemble Kalman filter for seasonal streamflow prediction in the western United States. Hydrol. Earth Syst. Sci. 2017, 21, 635–650. [Google Scholar] [CrossRef]
Seibert, J. Estimation of parameter uncertainty in the HBV model: Paper presented at the Nordic Hydrological Conference (Akureyri, Iceland—August 1996). Hydrol. Res. 1997, 28, 247–262. [Google Scholar] [CrossRef]
Thiboult, A.; Anctil, F. Assessment of a multimodel ensemble against an operational hydrological forecasting system. Can. Water Resour. J. 2015, 40, 272–284. [Google Scholar] [CrossRef]
Velázquez, J.A.; Anctil, F.; Ramos, M.H.; Perrin, C. Can a multi-model approach improve hydrological ensemble forecasting? A study on 29 French catchments using 16 hydrological model structures. Adv. Geosci. 2011, 29, 33–42. [Google Scholar] [CrossRef] [Green Version]
Uhlenbrook, S.; Seibert, J.A.N.; Leibundgut, C.; Rodhe, A. Prediction uncertainty of conceptual rainfall-runoff models caused by problems in identifying model parameters and structure. Hydrol. Sci. J. 1999, 44, 779–797. [Google Scholar] [CrossRef]
Reichle, R.H.; McLaughlin, D.B.; Entekhabi, D. Hydrologic data assimilation with the ensemble Kalman filter. Mon. Weather Rev. 2002, 130, 103–114. [Google Scholar] [CrossRef]
Alvarado-Montero, R.; Schwanenberg, D.; Krahe, P.; Lisniak, D.; Sensoy, A.; Sorman, A.A.; Akkol, B. Moving horizon estimation for assimilating H-SAF remote sensing data into the HBV hydrological model. Adv. Water Resour. 2016, 92, 248–257. [Google Scholar] [CrossRef]
Garen, D.C.; Marks, D. Spatially distributed energy balance snowmelt modelling in a mountainous river basin: Estimation of meteorological inputs and verification of model results. J. Hydrol. 2005, 315, 126–153. [Google Scholar] [CrossRef]
Hall, D.K.; Riggs, G.A.; Salomonson, V.V.; DiGirolamo, N.E.; Bayr, K.J. MODIS snow-cover products. Remote Sens. Environ. 2002, 83, 181–194. [Google Scholar] [CrossRef] [Green Version]
Sorman, A.A.; Yamankurt, E. Modified satellite products on snow covered area in upper Euphrates basin, Turkey. In Proceedings of the EGU General Assembly 2011, Vienna, Austria, 3–8 April 2011; Volume 13. [Google Scholar]
Surer, S.; Akyurek, Z. Evaluating the utility of the EUMETSAT HSAF snow recognition product over mountainous areas of eastern Turkey. Hydrol. Sci. J. 2012, 57, 1684–1694. [Google Scholar] [CrossRef]
Çoşkun, C. Comparative Analysis of Various Satellite Products through Hydrological Modeling. Master’s Thesis, Anadolu University, Eskişehir, Turkey, 2016. [Google Scholar]
Açıkyol, S. Validation of Satellite Snow Products, Application in Hydrological Modeling and Investigation of Climate Change Impacts. Master’s Thesis, Eskişehir Technical University, Eskişehir, Turkey, 2022. [Google Scholar]
Seibert, J.; Bergström, S. A retrospective on hydrological modelling based on half a century with the HBV model. Hydrol. Earth Syst. Sci. Discuss. 2022, 26, 1371–1388. [Google Scholar] [CrossRef]
Schwanenberg, D.; Bernhard, B. RTC-Tools Reference Manual. 2013. Available online: http://oss.deltares.nl/documents/102774/467082/rtc-tools_UserManual.pdf (accessed on 1 April 2016).
Bergström, S. The HBV model. In Computer Models of Watershed Hydrology; Singh, V.P., Ed.; Water Resources Publications: Highlands Ranch, CO, USA, 1995; pp. 443–476. [Google Scholar]
Lindström, G.; Johansson, B.; Persson, M.; Gardelin, M.; Bergström, S. Development and test of the distributed HBV-96 hydrological model. J. Hydrol. 1997, 201, 272–288. [Google Scholar] [CrossRef]
Wächter, A.; Biegler, L.T. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Math. Program. 2006, 106, 25–57. [Google Scholar] [CrossRef]
Alvarado-Montero, R. Variational Data Assimilation for Operational Flood Forecasting. Ph.D. Thesis, Universität Duisburg-Essen, Duisburg/Essen, Germany, 2021. [Google Scholar]
Brown, J.D.; Demarge, J.; Seo, D.J.; Liu, Y. The Ensemble Verification System (EVS): A software tool for verifying ensemble forecasts of hydrometeorological and hydrologic variables at discrete locations. Environ. Modell. Softw. 2010, 25, 854–872. [Google Scholar] [CrossRef]

Figure 1. Flowchart of the study.

Figure 2. Location of the study area and station network distribution.

Figure 3. Box and whisker plots of hydro-meteorological data set (2002–2012 WY period) (a) total annual precipitation (b) mean annual temperature and (c) mean annual discharge.

Figure 4. Karasu SCA and streamflow variation in the 2009–2012 WY period.

Figure 5. HBV model parameter distributions both for Monte Carlo simulation (800 mpi, 30% variation) and various model instances using AD reduction method (red + sign in the figure shows the outliers, and the full names of the model parameter are given in the Abbreviation).

Figure 6. An example of one-year (2011) hindcasting simulation for MP-VarDA (10 model instances).

Figure 7. Mean continuous ranked probability score (m-CRPS) performances of streamflow and SCA for data assimilation hindcasting experiments.

Figure 8. Brier Skill Score (BSS) performances of streamflow and SCA for data assimilation hindcasting experiments.

Figure 9. m-CRPS gain (%) in streamflow and SCA.

Figure 10. Percentage BSS gain (%) in streamflow and SCA.

Table 1. Data assimilation configuration for deterministic VarDA and probabilistic MP-VarDA.

				Objective Function Term
Type	Variable	Weights	Range of Noise Terms	Observation	Noise
Model outputs	Streamflow (Q)	10		$w_{∆ Q} {. ({\hat{Q}}^{k} - Q^{k, m})}^{2}$
Model outputs	Snow Cover Area (SCA)	1		$w_{∆ S C A} {. ({\hat{S C A}}^{k} - {S C A}^{k, m})}^{2}$
Model inputs	Precipitation (%)	1	0.7–1.3		$w_{P} . {(P_{k} (1 - P_{k, f}))}^{2}$
Model inputs	Temperature (°C)	1	−2.0–2.0		$w_{T} {. (T_{k, u p})}^{2}$
Model states	Soil Moisture (mm)	1	−1.0–1.0		$w_{S M} {. ({S M}_{k, u p})}^{2}$
	Upper Zone (mm)	1	−1.0–1.0		$w_{U Z} {. ({U Z}_{k, u p})}^{2}$
	Lower Zone (mm)	1	−1.0–1.0		$w_{L Z} {. ({L Z}_{k, u p})}^{2}$

m

is the model index,

k

is the time step index,

w_{P}

,

w_{T}

,

w_{S M}

,

w_{U Z}

,

w_{L Z}

denote weighting factors, ^ denotes observation variables,

∆ S C A

,

∆ Q

correspond to agreement terms between observed and simulated variables,

P_{k, f}

is a multiplicative update factor for precipitation,

T_{u p}

is the deviation for updating temperature, and

{S M}_{u p}

,

{U Z}_{u p}

, and

{L Z}_{u p}

are deviations for updating the model states of soil moisture, upper zone and lower zone, respectively.

Table 2. Set up of models.

Data Assimilation Technique	Model Pool Name	Model Name	Number of Model Pool Instance
-	mpi800	-	800
-	mpi1	noDA-mpi1	1
VarDA	mpi1	detDA-mpi1	1
MP-VarDA	mpi3	probDA-mpi3	3
	mpi5	probDA-mpi5	5
	mpi10	probDA-mpi10	10

Table 3. Ranges of the HBV model parameters * from the Monte Carlo simulation (800 mpi).

Low-Range	Mid-Range	Wide-Range
ECORR RFCF SFCF K_CON	CFMAX BETA K	TT TTI CFR CWH TM CFLUX ETF FC LP ALPHA K1 PERC LIC Eps_CON

* The model parameter abbreviations are explained in the Abbreviation.

Table 4. Summary of the m-CRPS gains.

DA Technique	Model Name	$Streamflow Gains (μ \pm σ, %)$				$SCA Gains (μ \pm σ, %)$
DA Technique	Model Name	All	>Pr = 0.25	>Pr = 0.50	>Pr = 0.75	All	>Pr = 0.25	>Pr = 0.50	>Pr = 0.75
VarDA	detDA-mpi1	57 ± 17	58 ± 18	58 ± 19	60 ± 19	31 ± 6	30 ± 6	36 ± 6	57 ± 6
MP-VarDA	probDA-mpi3	61 ± 14	62 ± 15	62 ± 16	64 ± 17	27 ± 4	27 ± 4	32 ± 5	58 ± 4
	probDA-mpi5	66 ± 12	67 ± 13	67 ± 13	69 ± 14	10 ± 9	9 ± 9	19 ± 9	68 ± 4
	probDA-mpi10	67 ± 11	68 ± 12	68 ± 12	69 ± 13	21 ± 5	21 ± 4	25 ± 5	59 ± 5

Table 5. Summary of the BSS gains.

DA Technique	Model Name	$Streamflow Gains (μ \pm σ, %)$				$SCA Gains (μ \pm σ, %)$
DA Technique	Model Name	All	>Pr = 0.25	>Pr = 0.50	>Pr = 0.75	All	>Pr = 0.25	>Pr = 0.50	>Pr = 0.75
VarDA	detDA-mpi1	52 ± 9	47 ± 20	72 ± 11	52 ± 9	8 ± 2	4 ± 2	44 ± 6	8 ± 2
MP-VarDA	probDA-mpi3	60 ± 6	53 ± 8	77 ± 7	60 ± 6	10 ± 1	0 ± 2	47 ± 4	10 ± 1
	probDA-mpi5	66 ± 5	63 ± 8	81 ± 6	66 ± 5	12 ± 1	−18 ± 1	40 ± 6	12 ± 1
	probDA-mpi10	68 ± 4	62 ± 8	80 ± 6	68 ± 4	9 ± 1	3 ± 3	31 ± 5	9 ± 1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Uysal, G.; Alvarado-Montero, R.; Şensoy, A.; Şorman, A.A. Comparison of Deterministic and Probabilistic Variational Data Assimilation Methods Using Snow and Streamflow Data Coupled in HBV Model for Upper Euphrates Basin. Geosciences 2023, 13, 89. https://doi.org/10.3390/geosciences13030089

AMA Style

Uysal G, Alvarado-Montero R, Şensoy A, Şorman AA. Comparison of Deterministic and Probabilistic Variational Data Assimilation Methods Using Snow and Streamflow Data Coupled in HBV Model for Upper Euphrates Basin. Geosciences. 2023; 13(3):89. https://doi.org/10.3390/geosciences13030089

Chicago/Turabian Style

Uysal, Gökçen, Rodolfo Alvarado-Montero, Aynur Şensoy, and Ali Arda Şorman. 2023. "Comparison of Deterministic and Probabilistic Variational Data Assimilation Methods Using Snow and Streamflow Data Coupled in HBV Model for Upper Euphrates Basin" Geosciences 13, no. 3: 89. https://doi.org/10.3390/geosciences13030089

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Comparison of Deterministic and Probabilistic Variational Data Assimilation Methods Using Snow and Streamflow Data Coupled in HBV Model for Upper Euphrates Basin

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area and Hydro-Meteorological Data

2.2. Snow Cover Area (SCA) Data

2.3. HBV Hydrological Model

2.4. Deterministic and Multi-Parametric Variational Data Assimilation

2.5. Aggregated Distance (AD) Method

2.6. Monte Carlo Configuration and Model Pool Generation

2.7. Performance Metrics

3. Results

3.1. Monte-Carlo Simulation and Model Pool Instance Results

3.2. Data Assimilation Hindcast Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviation

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI