Application of the Multimodel Ensemble Kalman Filter Method in Groundwater System

Xue, Liang

doi:10.3390/w7020528

Open AccessArticle

Application of the Multimodel Ensemble Kalman Filter Method in Groundwater System

by

Liang Xue

Department of Oil-gas Field Development Engineering, College of Petroleum Engineering, China University of Petroleum Engineering, China University of Petroleum; No. 18 Fuxue Road, Changping District, Beijing 102249, China

Water 2015, 7(2), 528-545; https://doi.org/10.3390/w7020528

Submission received: 7 November 2014 / Revised: 9 January 2015 / Accepted: 26 January 2015 / Published: 4 February 2015

Download

Browse Figures

Versions Notes

Abstract

:

With the development of in-situ monitoring techniques, the ensemble Kalman filter (EnKF) has become a popular data assimilation method due to its capability to jointly update model parameters and state variables in a sequential way, and to assess the uncertainty associated with estimation and prediction. To take the conceptual model uncertainty into account during the data assimilation process, a novel multimodel ensemble Kalman filter method has been proposed by incorporating the standard EnKF with Bayesian model averaging framework. In this paper, this method is applied to analyze the dataset obtained from the Hailiutu River Basin located in the northwest part of China. Multiple conceptual models are created by considering two important factors that control groundwater dynamics in semi-arid areas: the zonation pattern of the hydraulic conductivity field and the relationship between evapotranspiration and groundwater level. The results show that the posterior model weights of the postulated models can be dynamically adjusted according to the mismatch between the measurements and the ensemble predictions, and the multimodel ensemble estimation and the corresponding uncertainty can be quantified.

Keywords:

groundwater modeling; data assimilation; Bayesian model averaging; the ensemble Kalman filter

1. Introduction

Groundwater resources are crucial to the development of human society and the sustainability of the environment. The rational management and effective protection of groundwater require accurate characterization of hydrogeological conditions. Due to the complexity of the groundwater system, various uncertainties exist and limit our ability to wisely manage the groundwater resources [1,2]. The uncertainty associated with the groundwater system mainly originates from the spatial heterogeneity of the hydrogeological properties. It can be quantitatively described by using geostatistical methods, with the capability to constrain the spatial distribution of hydrogeological properties on direct measurements [3]. To further improve the characterization of hydrogeological conditions, inverse modeling methods have been introduced to constrain the hydrogeological characterization on the available measurements of the state variables, such as hydraulic head, flow rate and solute concentration, and reduce the corresponding uncertainty associated with the characterization [4,5,6,7,8,9].

There has been a growing tendency to take conceptual hydrological model uncertainty into account in the past decade. The existence of conceptual uncertainty has been demonstrated in geoscience [10]. If the conceptual uncertainty is neglected, two types of errors may occur: first, the prediction can be statistically biased due to the adoption of an invalid conceptual model; and second, the uncertainty and risk associated with the prediction can be underestimated due to insufficient sampling of the conceptual model space [11]. The Generalized Likelihood Uncertainty Estimation (GLUE) pioneered the methods that can account for conceptual model uncertainty explicitly [12,13,14]. A more statistically coherent multimodel analysis method is the Bayesian Model Averaging method [15,16,17,18]. The GLUE and BMA (Bayesian Model Averaging) methods are combined together by [19,20]. The Maximum Likelihood version of BMA (MLBMA) is proposed to incorporate the groundwater model calibration methods based on maximum likelihood estimation into the multimodel analysis framework more conveniently [11,21,22,23,24].

With the development of in-situ monitoring technology in groundwater management, the ensemble Kalman filter method (EnKF) has gained significant attention due to its capability to update the model parameter and state simultaneously in a sequential manner [25,26,27]. Recently, we developed a multimodel data assimilation method by embedding the traditional EnKF into the BMA framework [28]. The novel multimodel EnKF method offers all of the advantages of the traditional EnKF method, and can also take conceptual model uncertainty into consideration during the data assimilation process. The methodology of the multimodel EnKF has been thoroughly described and the accuracy of the proposed method is quantitatively justified using the statistical criteria in [28]. This work focuses on demonstration of the multimodel EnKF analysis procedures when applying them to a more practical case study. In this case, the meteorological, topographic and geological data are practically obtained from the Hailiutu River area in the Ordos Basin, China. Limited by the dataset of the detailed hydrogeological characterization and dynamical hydraulic head observation, we adopt a strategy similar to [29], which utilizes a synthetic hydraulic head observation data computed from a generated full heterogeneous reference field conditional on the available direct hydraulic conductivity measurements; therefore, this study is essentially synthetic. However, the benefits of this strategy are that: (1) the application of the proposed method can be easily extended to other practical cases by imitating the example in this research; (2) it can reflect available field conditions, such as meteorological, topographic and geological conditions, to the maximum extent; and (3) the performance of the proposed method to estimate the hydraulic conductivity field can be evaluated by comparing it against the generated reference hydraulic conductivity field.

The rest of this paper is arranged as follows. In Section 2, the multimodel EnKF method is briefly presented, and the study area, the formulation of the multiple models, and the method implementation are demonstrated in detail. In Section 3, the analysis results are discussed. The main conclusions are summarized in Section 4.

2. Methodology

2.1. Theoretical Background

The general multimodel ensemble Kalman filter method is detailed in [28]. Here, the implementation of the multimodel EnKF method is briefly introduced in a specialized manner for the groundwater modeling problem, in which the dynamic hydraulic head observations are assimilated to estimate the hydraulic conductivity field.

Suppose that a set M of N_k conceptual models,

M = {1, \dots, M_{k}, \dots, M_{N_{k}}}

, are postulated to simulate the groundwater system. For a given model M_k, the data assimilation process consists of two steps: the forecast step and the update step.

For the convenience of the method presentation, N_e realizations of state vector, s, are used here, which includes hydraulic conductivity, hydraulic head, and predicted values at the measurement locations:

{s_{1}, s_{2}, \dots, s_{N_{e}}} = {\begin{matrix} Y_{1} \\ h_{1} \\ m_{1} \end{matrix} \begin{matrix} Y_{2} \\ h_{2} \\ m_{2} \end{matrix} \dots \begin{matrix} Y_{N_{e}} \\ h_{N_{e}} \\ m_{N_{e}} \end{matrix}}

(1)

where

Y = ln K

is the natural logarithm of hydraulic conductivity K; h is the hydraulic head; and m is the predicted values at the measurement locations.

Usually, the hydraulic conductivity and hydraulic head in the state vector are updated simultaneously in the EnKF method. However, the updated hydraulic head and hydraulic conductivity are not consistent with each other if the nonlinearity of the groundwater model is strong. To obviate this issue, only the hydraulic conductivity is updated during each assimilation step, and the corresponding hydraulic head is predicted by:

h_{i, t}^{f} = M_{k} (Y_{i, t - 1}^{u}) + ξ_{i, t} = M_{k} (Y_{i, t}^{f}) + ξ_{i, t}

(2)

where

ξ_{t}

is the model prediction error, with the assumption that

E (ξ_{t}) = 0

and

E (ξ_{t} {ξ_{t}}^{T}) = Q_{t}

; the subscript i is the index for the realizations used in the EnKF, i = 1, 2, …, N_e; the subscript t is the index for the assimilation step; and the superscripts f and u represent forecast and update, respectively. This equation indicates that the hydraulic head for the ith realization in the ensemble at time t is forecasted by the updated log hydraulic conductivity for the ith realization in the ensemble at time t−1 through groundwater simulation model

M_{k} (\cdot)

, and the updated log hydraulic conductivity at time t−1 is treated as the log hydraulic conductivity at time t in the forecast step. Together with the hydraulic head at time t and the observed hydraulic head at time t, they form the state vector

s_{i, t}^{f}

.

The observation values and the true state vector at time t can be related through:

h_{i, t}^{o b s} = H_{t} s_{t}^{t r u e} + ε_{t} + ζ_{i, t}

(3)

where the superscript obs represents the observed value; and the superscript true represents the true (usually unknown) status;

ε_{t}

is the Gaussian-distributed measurement error with

E (ε_{t}) = 0

and

E (ε_{t} ε_{t}^{T}) = C_{h, t}

;

ζ_{i, t}

is the added perturbation error [30], which is usually assumed to have the same distribution as the measurement error, i.e.,

ζ_{i, t} \sim N (0, C_{h, t})

; T is the transpose operator; and H is a matrix operator which only contains 0 and 1 elements, and can be written as:

H = [0 I]

(4)

where I is the identity matrix.

The ensemble mean and covariance in the forecast step are:

{\bar{s}}_{t}^{f} = \frac{1}{N_{e}} \sum_{i = 1}^{N_{e}} s_{i, t}^{f}

(5)

C_{s, t}^{f} = \frac{1}{N_{e} - 1} \sum_{i = 1}^{N_{e}} {[s_{i, t}^{f} - {\bar{s}}_{t}^{f}] {[s_{i, t}^{f} - {\bar{s}}_{t}^{f}]}^{T}}

(6)

The Kalman gain at time t can be written as:

K_{t} = C_{s, t}^{f} {H_{t}}^{T} {(H_{t} C_{s, t}^{f} {H_{t}}^{T} + C_{h, t})}^{- 1}

(7)

The updated ensemble member of the state vector is:

s_{i, t}^{u} = s_{i, t}^{f} + K_{t} [h_{i, t}^{o b s} - H_{t} s_{i, t}^{f}]

(8)

The ensemble mean and covariance of the updated state vector, conditional on all of the available observation data up to time t and the given model, are:

E (s_{t}^{u} | h_{1 : t}^{o b s}, M_{k}) = \frac{1}{N_{e}} \sum_{i = 1}^{N_{e}} s_{i, t}^{u}

(9)

C o v (s_{t}^{u} | h_{1 : t}^{o b s}, M) = \frac{1}{N_{e} - 1} \sum_{i = 1}^{N_{e}} {[s_{i, t}^{u} - E (s_{t}^{u})] {[s_{i, t}^{u} - E (s_{t}^{u})]}^{T}}

(10)

where

h_{1 : t}^{o b s} = {[h_{1}^{o b s}, h_{2}^{o b s}, \dots, h_{t}^{o b s}]}^{T}

is the observed hydraulic head data up to time t.

Bayesian model averaging framework provides a coherent way to deal with the situation in which multiple conceptual models are postulated. It shows that the posterior distribution of the predictions, conditional on all of the available data, is:

p (Δ | h_{1 : t}^{o b s}) = \sum_{k = 1}^{K} p (Δ | h_{1 : t}^{o b s}, M_{k}) p (M_{k} | h_{1 : t}^{o b s})

(11)

This equation indicates that the posterior distribution of the predictions after averaging the model space is the weighted summation of the posterior distribution based on individual models. Now the key issue is to evaluate the posterior model weight.

The posterior model weight can be computed through Bayes’ theorem as:

p (M_{k} | h_{1 : t}^{o b s}) = \frac{p (h_{1 : t}^{o b s} | M_{k}) p (M_{k})}{\sum_{l = 1}^{K} p (h_{1 : t}^{o b s} | M_{l}) p (M_{l})}

(12)

The marginal likelihood can be evaluated through the Monte Carlo integration method as:

\begin{array}{l} p (h_{1 : t}^{o b s} | M_{k}) = \int p (h_{1 : t}^{o b s} | Y_{k, t}^{u}, M_{k}) p (Y_{k, t}^{u} | M_{k}) d Y_{k, t}^{u} \\ \approx \frac{1}{N_{e}} \sum_{i = 1}^{N_{e}} p (h_{i, 1 : t}^{o b s} | Y_{i, k, t}^{u}, M_{k}) \end{array}

(13)

Several types of likelihood functions can be found in [19]. Here, we use the form:

p (h_{i, 1 : t}^{o b s} | Y_{i, k, t}^{u}, M_{k}) = {[{(h_{i, k, 1 : t}^{u} - h_{i, 1 : t}^{o b s})}^{T} {C_{h_{1 : t}^{o b s}}}^{- 1} (h_{i, k, 1 : t}^{u} - h_{i, 1 : t}^{o b s})]}^{- E}

(14)

where E is a user-defined parameter (we choose E = 1 in this work).

The posterior mean and covariance of the estimated hydraulic conductivity, after model averaging, are, respectively:

E (Y_{k, t}^{u} | h_{1 : t}^{o b s}) = E_{M} E (Y_{k, t}^{u} | h_{1 : t}^{o b s}, M) = \sum_{k = 1}^{N_{K}} E (Y_{k, t}^{u} | h_{1 : t}^{o b s}, M_{k}) p (M_{k} | h_{1 : t}^{o b s})

(15)

\begin{array}{l} C o v (Y_{k, t}^{u} | h_{1 : t}^{o b s}) \\ = E_{M_{k}} C o v (Y_{k, t}^{u} | h_{1 : t}^{o b s}, M_{k}) + C o v_{M_{k}} E (Y_{k, t}^{u} | h_{1 : t}^{o b s}, M_{k}) \\ = \sum_{k = 1}^{K} C o v (Y_{k, t}^{u} | h_{1 : t}^{o b s}, M_{k}) p (M_{k} | h_{1 : t}^{o b s}) \\ + \sum_{k = 1}^{K} [E (Y_{k, t}^{u} | h_{1 : t}^{o b s}, M_{k}) - E (Y_{k, t}^{u} | h_{1 : t}^{o b s})] \\ \cdot {[E (Y_{k, t}^{u} | h_{1 : t}^{o b s}, M_{k}) - E (Y_{k, t}^{u} | h_{1 : t}^{o b s})]}^{T} p (M_{k} | h_{1 : t}^{o b s}) \end{array}

(16)

The posterior mean and covariance of the predicted hydraulic head, after model averaging, can be obtained simply by replacing

Y_{k, t}^{u}

with

h_{k, t}^{u}

in Equations (15) and (16).

2.2. Study Area

The study site, the Hailiutu River Basin, is located in a semi-arid region of northwest China as shown in Figure 1, covering an area of about 2600 km² between 38°06' N and 38°50' N and 108°37' E and 109°15' E. Groundwater level fluctuation is mainly driven by seasonally varying recharge and evapotranspiration. Based on the geological investigation information, two major layers in the aquifer system are identified: the overlying unconfined layer consists of Quaternary Shala-Usun fine sand, with thickness between 0.5 and 217 m; and the underlying confined layer consists of Cretaceous Luohe sandstone, with thickness between 20 and 637 m (see the elevation of the studied aquifer system in Figure 2). No regional aquitard has been found between these two aquifer layers. The Hailiutu River flows mainly through the upper layer and cuts into the second layer near the end of the river system.

Figure 1. Location of the Hailiutu River Basin.

Figure 2. The elevation (in meters) contour of the studied aquifer.

2.3. Model Setup

The groundwater simulation software, MODFLOW, is used to build the numerical model of the study site. The study area is discretized into a 200 × 140 grid blocks with two layers, and each block represents a 500 m × 500 m area. The boundary of the study area is delineated based on the watershed divide, and it is treated as a no-flow boundary. The cells out of the boundary are characterized as inactive cells, as shown by the gray area in Figure 3. The number of the active cells is 20,840. The upper layer is an unconfined aquifer with a specific yield of 0.1, and the bottom layer is a confined aquifer with a specific storage of 1 × 10⁻⁵ m⁻¹. The discretized grid for each layer is shown in Figure 3. The yellow line in Figure 3 represents the location of the river. The riverbed conductance is taken to be 300 m²/d for the part of the river flowing through the upper layer and 150 m²/d for the part of the river flowing through the bottom layer. The initial condition of this groundwater model is taken as the simulation result at the end of year 2006, which is obtained from a previous base flow analysis model. The simulation period is 3 years (field measurements of precipitation and pan evaporation from the beginning of 2007 to the end of 2009 are used), which are divided into 36 stress periods with the time step as 1 day, i.e., each stress period corresponds to a specific month. The differences of stress periods lie in the recharge and evapotranspiration. The precipitation (measured by rain gauge) and evaporation (measured by evaporation pan with a diameter of 20 cm) rate during the simulation period are depicted in Figure 4. The pan evaporation is converted to the potential or maximum evapotranspiration by multiplying a conversion constant. Based on a previous study, the conversion constant is set to be 0.3. The recharge coefficient of this area is set to be 0.32, and the evapotranspiration extinction depth is 4.6 m in this area.

Figure 3. The discretized study area: (a) the upper layer; (b) the bottom layer. The yellow line represents the location of the river.

Figure 4. Precipitation (measured by rain gauge) and evaporation (measured by evaporation pan with a diameter of 20 cm) rates for 36 stress periods.

2.4. Synthetic Data Generation

Owing to the lack of sufficient and reliable real dynamic head observation data, a strategy similar to [29] is adopted, i.e., a series of simulated head observation values are used for the analysis. The head observation values are computed based on the synthetically generated high-resolution log hydraulic conductivity reference field.

To capture the geological characterization of the field when generating the reference log hydraulic conductivity field by using the geostatistical method, the proper variogram model needs to be selected and the variogram parameters should be calibrated against the field measurements of the log hydraulic conductivity. The hydraulic conductivity measurements are listed in Table 1. It can be found that 10 hydraulic conductivity measurements are available for the upper layer. For a specific potential variogram model, the adjoint state maximum likelihood cross validation (ASMLCV) method proposed by [31] is employed to estimate the variogram model parameters of the upper layer of the aquifer. The ASMLCV minimizes the cross validation errors between the measured values and the Kriging estimated values, which is expressed in the likelihood function form:

N L L = M ln 2 π + \sum_{i = 1}^{M} ln σ_{i}^{2} + \sum_{i = 1}^{M} \frac{e_{i}^{2}}{σ_{i}^{2}}

(17)

where M is the number of the calibration data; ei is the cross validation error, which is the difference between the ith measurement and Kriged value using the rest of the measurements; and

σ

is the Kriging variance. Three common variogram models are calibrated against the hydraulic conductivity measurements in the upper layer. They are exponential, Gaussian, and spherical variogram models, respectively. The proper model to describe the geostatistical property of the upper layer is selected based on the Kashyap information criterion (KIC) [22]:

K I C = - 2 N L L - 2 ln p (\hat{θ}) + N_{k} ln (M / 2 π) + ln | \bar{F} |

(18)

where

p (\hat{θ})

is the prior probability of the parameters

\hat{θ}

, in this case, the parameters are the sill and integral scale or range of the variogram models; N_k is the number of the parameters (here, it is 2); and

| \bar{F} |

is the determinant of the normalized Fisher information matrix. The lesser the KIC value is, the higher the model ranks. The estimated parameters, NLL, and KIC values are listed in Table 2. The exponential model is selected due to its lower KIC value. There are only 3 measurements for the bottom layer because the wells were not drilled deep enough; thus, the exponential variogram parameters for the bottom layer are inferred from a similar field in the Ordos Basin around the study area. Here, sill and integral scale of the bottom layer are set to be 0.15 and 20,000 m, respectively.

The conditional (on all of the hydraulic conductivity data listed in Table 1) reference log conductivity field is generated using GSLIB [32]. To obtain the corresponding groundwater dynamic measurements, 25 synthetically designed head observation locations are used in this case. The generated log hydraulic conductivity field, head observation locations, and initial head distribution for each layer can be found in Figure 5.

Table 1. The hydraulic conductivity measurements in the study area.

**Table 1.** The hydraulic conductivity measurements in the study area.
Well No.	x	y	Layer	K (m/d)
BH1	336,389.658	4,260,247.280	Upper	12.89
BH2	332,073.543	4,258,013.852	Upper	21.47
BH3	336,869.313	4,255,386.935	Upper	9.09
BH4	335,747.570	4,251,537.000	Bottom	0.39
BH5	333,485.000	4,251,550.000	Upper	17.92
BH7	331,812.363	4,243,332.846	Upper	14.65
BH8	337,568.173	4,246,870.834	Upper	17.12
BH9	323,502.996	4,239,096.890	Bottom	0.077
BH10	334,105.124	4,239,863.620	Upper	18.32
BH14	335,747.570	4,226,004.390	Upper	5.56
BH15	339,219.970	422,110.540	Bottom	0.066
BH16	324,070.370	4,240,827.500	Upper	10.15
BH17	342,183.141	4,252,717.435	Upper	15.64

Table 2. The parameter, NLL and KIC values, and model rank for the upper layer.

**Table 2.** The parameter, NLL and KIC values, and model rank for the upper layer.
Model	Sill	Integral Scale/Range	NLL	KIC	Rank
Exponential	0.200	10,016.06	9.413	2.491	1
Gaussian	0.161	10,975.12	9.184	4.951	2
Spherical	0.211	27,543.90	9.281	6.071	3

Figure 5. The synthetically generated high-resolution log hydraulic conductivity (flooded contour), initial hydraulic head (contour lines), and 25 synthetically designed head observation locations (black dots). (a) Layer 1; (b) Layer 2.

2.5. Multiple Conceptual Models

To construct multiple conceptual models to simulate this hydrogeological system, two uncertainty sources are considered. Geological model uncertainty due to different interpretations of geological and geophysical data is a well-known model uncertainty, and various zonations can be achieved by varying the number and distribution of hydraulic conductivity zones [33,34]. In particular, the hydraulic conductivity field in the two-layer aquifer system is partitioned into 6 and 8 zones as alternative geological models to the reference one with high resolution. The effect of evapotranspiration is crucial to groundwater management in semi-arid areas. The relationship between evapotranspiration rate and groundwater level is, however, uncertain. Since MODFLOW-2000, the evapotranspiration segments package (ETS) allows simulation of evapotranspiration with a user-defined relation [35]. A two-segment relation model (with the parameter PXDP = 0.4 and PETM = 0.3 at the intermediate segment endpoint) as the reference one is utilized to generate the head observations. As alternative models, a single-segment (linear, no intermediate segment endpoint) relation model and a three-segment model (with the parameters PXDP = 0.2 and PETM = 0.4 at the first intermediate segment endpoint and PXDP = 0.5 and PETM = 0.1 at the second one) are postulated in this case. The relationships used to describe the evapotranspiration rate and hydraulic head are depicted in Figure 6. The combination of two different zonations (with 6 and 8 zones, respectively) and two different segmented evapotranspiration relations (with 1 and 3 segments, respectively) gives 4 alternatively postulated simulation models. Note that the true two-segment evapotranspiration relation model is intentionally excluded from the postulated model set to reflect that the reference model is essentially unknown in practice.

Figure 6. The reference relationship (black line) and two postulated relationships (red and blue lines) used in the ETS package.

The multimodel EnKF analysis procedures are the same as those presented in Section 2. The key parameters to implement the analysis are listed in Table 3.

Table 3. Summary of the implementation parameters.

**Table 3.** Summary of the implementation parameters.
Parameter	Value	Unit
Discretization
Row	200	-
Column	140	-
Layer	2	-
Active cell for each layer	10,424	-
Grid spacing	500	m
Number of stress periods	36	-
Number of time steps	28–31	d
Flow (based on real field data)
Horizontal hydraulic conductivity anisotropy ratio	1	-
Vertical hydraulic conductivity anisotropy ratio	0.1 (layer 1)	-
Vertical hydraulic conductivity anisotropy ratio	0.01 (layer 2)	-
Specific yield for layer 1	0.1	-
Specific storage for layer 2	1 × 10⁻⁵	m⁻¹
Riverbed conductance	300 (layer 1)	m²/d
Riverbed conductance	150 (layer 2)	m²/d
Maximum ET rate	0–0.004	m/d
ET extinction depth	4.6	m
Recharge flux	0–0.0018	m/d
Synthetic true hydraulic conductivity field
Varigram model	Exponential	-
Mean	0	-
Variance	0.2 (layer 1)	-
Variance	0.15 (layer 2)	-
Integral scale	10,000 (layer 1)	m
Integral scale	20,000 (layer 2)	m
Measurement
Number of head measurements	50	-
Standard deviation	10% of drawndown	m
The proposed multi-model EnKF
Number of ensemble members	100	-
Number of assimilation steps	36	-
Number of postulated models	4	-
Label of the postulated models (“Z” and “S” represent zone and segment, respectively)	Z6S1	-
	Z6S3	-
	Z8S1	-
	Z8S3	-

3. Results and Discussion

In this study, the assimilation step coincides with the stress period, i.e., the hydraulic head measurements are assimilated to update the log hydraulic conductivity field at the end of each stress period. Figure 7 shows the dynamic change of posterior model weights in each assimilation step. In the first few steps, owing to the fact that less head observation information is available to be assimilated and the postulated geological models are both highly simplified through zonation (relative to the generated reference heterogeneous hydraulic conductivity field), the posterior weights of all the models are nearly uniformly assigned, e.g., each weight is nearly 25%. As more head observations become available, the posterior model weight of model Z8S3 tends to continue increasing and that of model Z6S1 exhibits the opposite tendency. For model Z6S3, the posterior model weight increases in the first 15 steps and starts decreasing after that. The change of the posterior model weight of model Z8S1 is oscillated without a general trend. After about 30 assimilation steps, these weights stabilize and the most sophisticatedly postulated model in this case, Z8S3, is ranked as the best model to describe the reference system, with a posterior model weight of 79.87%.

Figure 7. Posterior model weights changing with time for individual postulated models.

Figure 8a,b and Figure 9a,b depict the multi-model ensemble mean for each layer before and after 36 steps of data assimilation in the studied aquifer system. It can be found that the ensemble mean after 36 steps of data assimilation captures the general pattern of the log hydraulic conductivity distribution by comparing them with the generated reference log hydraulic conductivity field shown in Figure 5. The corresponding multimodel ensemble estimation uncertainties before and after 36 steps of data assimilation for both layers are depicted in Figure 8c,d and Figure 9c,d. As expected, after the data assimilation process, the estimation uncertainty has been dramatically decreased. This quantified multimodel uncertainty is fundamentally more accurate than its counterpart based on the individual model because the conceptual model uncertainty can be taken into consideration, which can be an important uncertainty source. This type of uncertainty can propagate into the prediction of groundwater dynamics through the simulation of the groundwater system, and significantly affect the consequent decision-making or risk assessment based on the prediction. The conceptual model uncertainty has been ignored in most of the previous studies because no suitable algorithm can account for it. Here, with the proposed multimodel EnKF method, the overall uncertainty considering the conceptual model uncertainty can be thoroughly evaluated in each data assimilation step.

Figure 8. The multi-model ensemble mean and variance of log hydraulic conductivity for each layer before data assimilation.

Figure 9. The multi-model ensemble mean and variance of log hydraulic conductivity for each layer after 36 steps of data assimilation.

To further investigate the uncertainty change in each assimilation step, the spatial averaged uncertainty or variance (SDV) value is analyzed. It is computed as:

SDV (t) = \sqrt{\frac{1}{N_{j}} \sum_{j = 1}^{N_{j}} V a r (Y_{j, k, t}^{u} | h_{1 : t}^{o b s})}

(19)

where N_j is the number of grid nodes; and j is the node index. It can be observed from Equation (16) that the total variance,

V a r (Y_{k, t}^{u} | h_{1 : t}^{o b s})

, is composed of two parts: the within-model variance,

E_{M_{k}} V a r (Y_{k, t}^{u} | h_{1 : t}^{o b s}, M_{k})

and the between-model variance,

V a r_{M_{k}} E (Y_{k, t}^{u} | h_{1 : t}^{o b s}, M_{k})

. The dynamic change of the spatial averaged variances, including total variance, within-model variance, and between-model variance, are plotted in Figure 10. If a full heterogeneous model is calibrated against the monitored hydraulic head data, it requires estimating 20,840 log hydraulic conductivity values, i.e., log hydraulic conductivity values in all of the active cells, which may bring significant uncertainties associated with the estimation. However, with the zonation patterns in this case, it only requires estimating 6 log hydraulic conductivity values for model Z6S1 and Z6S3, and 8 log hydraulic conductivity values for model Z8S1 and Z8S3. Owing to the fact that the zonation dramatically reduces the number of parameters that need to be estimated, the within-model uncertainty decreases very rapidly with the assimilation of head observation data, and the between-model uncertainty increases in the beginning steps and then decreases when a certain amount of head data has been assimilated. The total uncertainty is dominated by the within-model uncertainty in the beginning and by the between-model uncertainty in the end. This highlights the importance of taking model uncertainty into account; otherwise, the risk induced by the uncertainty will be underestimated.

Figure 10. Dynamic changing of spatial averages of total multi-model ensemble variance, within-model variance, and between-model variance with time.

4. Conclusions

The proposed multimodel ensemble Kalman filter method is applied to a three-dimensional case study in this work to estimate the hydraulic conductivity field by assimilating dynamic hydraulic head observation data. The main contribution of the proposed multimodel EnKF method is that it can take conceptual model uncertainty into account and also possesses all of the capabilities of the widely used single-model based EnKF method. The accuracy and superiority of this method have been investigated in detail in a previous work published by the same author using a two-dimensional synthetic example. The aim of this paper is to show how to implement this method step-by-step in a more practical case. Despite the fact that part of the dataset is synthetically generated, it does not affect the demonstration of the application of this method in a practical case study. On the contrary, it helps us to assess the performance of this method explicitly while considering all of the collected field information.

Multiple conceptual models are constructed by considering two common but crucial uncertainty sources that influence the groundwater dynamics in a semi-arid study area, such as the Hailiutu River Basin, in this study. One is the zonation pattern of the hydraulic conductivity field, and the other is the relationship between the evapotranspiration and the groundwater level. The posterior model weight can be adjusted dynamically in each assimilation step, according to the mismatch between the predicted values and the actual observations. The mismatch is quantified by the likelihood function, and the posterior model weights are obtained through Bayes’ theorem. The obtained multimodel ensemble mean is a weighted average of the ensemble means based on individual models, and the multimodel ensemble uncertainty consists of within-model uncertainty and between-model uncertainty. All of these terms can be quantified in each assimilation step. The results show that the posterior model weights change dynamically and tend to be stabilized when sufficient observations are assimilated. With the increasing of the assimilated information, the within-model uncertainty continues decreasing but the between-model uncertainty tends to increase in the beginning and stabilize in the late stage, and the overall uncertainty tends to decrease in the beginning and stabilize in the late stage.

Acknowledgments

This work is partially funded by the National Natural Science Foundation of China (Grant No. 41402199), the Science Foundation of China University of Petroleum, Beijing (Grant No. 2462014YJRC038), and the Platform Construction Project for Researches on the Relationship between Water and Ecology in the Ordos Plateau (Grant No. 201311076).

Author Contributions

As the sole author of this manuscript, Liang Xue developed the method, applied it to the case study, analyzed the simulation results, and wrote the paper.

Conflicts of Interest

The author declares no conflict of interest.

References

Hill, M.C.; Tiedeman, C.R. Effective Groundwater Model Calibration: With Analysis of Data, Sensitivities, Predictions, and Uncertainty; John Wiley & Sons: New York, NY, USA, 2007; p. 480. [Google Scholar]
Carrera, J. An overview of uncertainties in modelling groundwater solute transport. J. Contam. Hydrol. 1993, 13, 23–48. [Google Scholar] [CrossRef]
Kitanidis, P.K. Introduction to Geostatistics: Applications in Hydrogeology; Cambridge University Press: Cambridge, UK, 1997. [Google Scholar]
Yeh, W.W.-G. Review of parameter identification procedures in groundwater hydrology: The inverse problem. Water Resour. Res. 1986, 22, 95–108. [Google Scholar] [CrossRef]
Ginn, T.R.; Cushman, J.H. Inverse methods for subsurface flow: A critical review of stochastic techniques. Stoch. Hydrol. Hydraul. 1990, 4, 1–26. [Google Scholar] [CrossRef]
Gómez-Hernández, J.J.; Franssen, H.-J.H.; Sahuquillo, A. Stochastic conditional inverse modeling of subsurface mass transport: A brief review and the self-calibrating method. Stoch. Environ. Res. Risk Assess. 2003, 17, 319–328. [Google Scholar] [CrossRef]
Hendricks Franssen, H.J.; Alcolea, A.; Riva, M.; Bakr, M.; van der Wiel, N.; Stauffer, F.; Guadagnini, A. A comparison of seven methods for the inverse modelling of groundwater flow. Application to the characterisation of well catchments. Adv. Water Resour. 2009, 32, 851–872. [Google Scholar]
Riva, M.; Guadagnini, A.; Neuman, S.P.; Janetti, E.B.; Malama, B. Inverse analysis of stochastic moment equations for transient flow in randomly heterogeneous media. Adv. Water Resour. 2009, 32, 1495–1507. [Google Scholar] [CrossRef]
Carrera, J.; Alcolea, A.; Medina, A.; Hidalgo, J.; Slooten, L.J. Inverse problem in hydrogeology. Hydrogeol. J. 2005, 13, 206–222. [Google Scholar] [CrossRef]
Bond, C.E.; Gibbs, A.D.; Shipton, Z.K.; Jones, S. What do you think this is? “Conceptual uncertainty” in geoscience interpretation. GSA Today 2007, 17, 4. [Google Scholar]
Neuman, S.P. Maximum likelihood Bayesian averaging of uncertain model predictions. Stoch. Environ. Res. Risk Assess. 2003, 17, 291–305. [Google Scholar] [CrossRef]
Beven, K.; Binley, A. The future of distributed models: Model calibration and uncertainty prediction. Hydrol. Process. 1992, 6, 279–298. [Google Scholar] [CrossRef]
Feyen, L.; Beven, K.J.; de Smedt, F.; Freer, J. Stochastic capture zone delineation within the generalized likelihood uncertainty estimation methodology: Conditioning on head observations. Water Resour. Res. 2001, 37, 625–638. [Google Scholar] [CrossRef]
Freer, J.; Beven, K.; Ambroise, B. Bayesian estimation of uncertainty in runoff prediction and the value of data: An application of the GLUE approach. Water Resour. Res. 1996, 32, 2161–2173. [Google Scholar] [CrossRef]
Hoeting, J.A.; Madigan, D.; Raftery, A.E.; Volinsky, C.T. Bayesian Model Averaging: A Tutorial. Stat. Sci. 1999, 14, 382–401. [Google Scholar] [CrossRef]
Duan, Q.; Ajami, N.K.; Gao, X.; Sorooshian, S. Multi-model ensemble hydrologic prediction using Bayesian model averaging. Adv. Water Resour. 2007, 30, 1371–1386. [Google Scholar] [CrossRef]
Rings, J.; Vrugt, J.A.; Schoups, G.; Huisman, J.A.; Vereecken, H. Bayesian model averaging using particle filtering and Gaussian mixture modeling: Theory, concepts, and simulation experiments. Water Resour. Res. 2012, 48, W05520. [Google Scholar]
Tsai, F.T.-C. Bayesian model averaging assessment on groundwater management under model structure uncertainty. Stoch. Environ. Res. Risk Assess. 2010, 24, 845–861. [Google Scholar] [CrossRef]
Rojas, R.; Feyen, L.; Dassargues, A. Conceptual model uncertainty in groundwater modeling: Combining generalized likelihood uncertainty estimation and Bayesian model averaging. Water Resour. Res. 2008, 44, W12418. [Google Scholar] [CrossRef]
Rojas, R.; Feyen, L.; Batelaan, O.; Dassargues, A. On the value of conditioning data to reduce conceptual model uncertainty in groundwater modeling. Water Resour. Res. 2010, 46, W08520. [Google Scholar]
Ye, M.; Neuman, S.P.; Meyer, P.D. Maximum likelihood Bayesian averaging of spatial variability models in unsaturated fractured tuff. Water Resour. Res. 2004, 40, W05113. [Google Scholar]
Ye, M.; Meyer, P.D.; Neuman, S.P. On model selection criteria in multimodel analysis. Water Resour. Res. 2008, 44. [Google Scholar] [CrossRef]
Neuman, S.P.; Xue, L.; Ye, M.; Lu, D. Bayesian analysis of data-worth considering model and parameter uncertainties. Adv. Water Resour. 2012, 36, 75–85. [Google Scholar] [CrossRef]
Xue, L.; Zhang, D.; Guadagnini, A.; Neuman, S.P. Multimodel Bayesian analysis of groundwater data worth. Water Resour. Res. 2014, 50, 8481–8496. [Google Scholar] [CrossRef]
Chen, Y.; Zhang, D. Data assimilation for transient flow in geologic formations via ensemble Kalman filter. Adv. Water Resour. 2006, 29, 1107–1122. [Google Scholar] [CrossRef]
Hendricks Franssen, H.J.; Kinzelbach, W. Real-time groundwater flow modeling with the Ensemble Kalman Filter: Joint estimation of states and parameters and the filter inbreeding problem. Water Resour. Res. 2008, 44. [Google Scholar] [CrossRef]
Xie, X.; Zhang, D. Data assimilation for distributed hydrological catchment modeling via ensemble Kalman filter. Adv. Water Resour. 2010, 33, 678–690. [Google Scholar] [CrossRef]
Xue, L.; Zhang, D. A multimodel data assimilation framework via the ensemble Kalman filter. Water Resour. Res. 2014, 50, 4197–4219. [Google Scholar] [CrossRef]
Kurtz, W.; Hendricks Franssen, H.-J.; Brunner, P.; Vereecken, H. Is high-resolution inverse characterization of heterogeneous river bed hydraulic conductivities needed and possible? Hydrol. Earth Syst. Sci. 2013, 17, 3795–3813. [Google Scholar] [CrossRef]
Burgers, G.; van Leeuwen, P.; Evensen, G. Analysis scheme in the ensemble Kalman filter. Mon. Weather Rev. 1998, 126, 1719–1724. [Google Scholar] [CrossRef]
Samper, F.J.; Neuman, S.P. Estimation of spatial covariance structures by adjoint state maximum likelihood cross validation: 1. Theory. Water Resour. Res. 1989, 25, 351–362. [Google Scholar] [CrossRef]
Deutsch, C.V.; Journel, A.G. GSLIB: Geostatistical Software Library and User’s Guide, 2nd ed.; Oxford University Press Oxford: New York, NY, USA, 1998. [Google Scholar]
Ye, M.; Pohlmann, K.F.; Chapman, J.B.; Pohll, G.M.; Reeves, D.M. A model-averaging method for assessing groundwater conceptual model uncertainty. Ground Water 2010, 48, 716–728. [Google Scholar] [CrossRef] [PubMed]
Poeter, E.; Anderson, D. Multimodel ranking and inference in ground water modeling. Ground Water 2005, 43, 597–605. [Google Scholar] [CrossRef] [PubMed]
Banta, E.R. MODFLOW-2000: The US Geological Survey Modular Ground-Water Model--Documentation of Packages for Simulating Evapotranspiration with a Segmented Function (ETS1) and Drains with Return Flow (DRT1), USGS Open-File Report 00–466; US Geological Survey: Reston, VA, USA, 2000. [Google Scholar]

© 2015 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xue, L. Application of the Multimodel Ensemble Kalman Filter Method in Groundwater System. Water 2015, 7, 528-545. https://doi.org/10.3390/w7020528

AMA Style

Xue L. Application of the Multimodel Ensemble Kalman Filter Method in Groundwater System. Water. 2015; 7(2):528-545. https://doi.org/10.3390/w7020528

Chicago/Turabian Style

Xue, Liang. 2015. "Application of the Multimodel Ensemble Kalman Filter Method in Groundwater System" Water 7, no. 2: 528-545. https://doi.org/10.3390/w7020528

Article Menu

Application of the Multimodel Ensemble Kalman Filter Method in Groundwater System

Abstract

1. Introduction

2. Methodology

2.1. Theoretical Background

2.2. Study Area

2.3. Model Setup

2.4. Synthetic Data Generation

2.5. Multiple Conceptual Models

3. Results and Discussion

4. Conclusions

Acknowledgments

Author Contributions

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI