1. Introduction
Recently, distributed state estimation has been a hot topic in the field of target tracking in sensor networks [
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19]. As a traditional method, the centralized scheme needs to simultaneously process the local measurements from all sensors in the fusion center at each time instant [
3,
15]. This scheme guarantees the optimality of estimates, but a lot of communication and a powerful fusion center are required to maintain the operation, which may give rise to problems when the network size is increased or the communication resources are restricted.
Unlike the centralized scheme, the distributed mechanism tries to recover the centralized performance via local communications between neighboring nodes. Specifically, each node in the network only exchanges information with its immediate neighbors to achieve a comparable performance to its centralized counterpart, which reduces communication cost and makes the network robust against possible failures of some nodes [
8]. The consensus filter, which computes the average of interested values in a distributed manner, has attracted immense popularity for distributed state estimation [
4,
5,
6,
7,
9,
10,
11,
12,
13,
14,
16,
17,
18,
19,
20,
21,
22]. Recently, in [
23,
24], the multiscale consensus scheme, in which the local estimated states achieve asymptotically prescribed ratios in terms of multiple scales, has been discussed and analyzed. The well-known Kalman consensus filter (KCF) [
4,
5,
6,
9,
14] combines the local Kalman filter with the average consensus protocol together to update the posterior state. In the update stage, each node exploits the measurement innovations as well as the prior estimates from its inclusive neighbors (including the node itself and its immediate neighbors) to correct its prior estimate. However, the prior estimates from its immediate neighbors are assigned with same weights. This may ensure consensus on the estimates from different nodes after a period of time, but the estimation accuracy is not guaranteed. It is very likely that a target is neither observed by a certain node nor observed by any of its immediate neighbors. That is, there is no measurement in the inclusive neighborhood of the node, and it is naive about the target’s state. Similar to [
16,
22], such a node is referred as a naive node. Since a naive node contains less information about the target, it usually results in an inaccurate estimate. If a naive node is given an identical weight to the informed nodes, the final estimate will be severely contaminated, which may even cause the final estimates to diverge [
9,
13]. In addition, the cross covariances among different nodes are ignored in the derivation of KCF for computational and bandwidth requirements, and thus the covariance of each node is updated without regard to its neighbor’s prior covariance during the consensus step. Given no naive nodes in the network, KCF is able to provide satisfactory results. However, due to limited sensing abilities or constrained communication resources, a network often consists of some naive nodes. Especially in sparse sensor networks, this phenomenon is even more serious. In such a case, KCF would result in poor estimates [
3]. To solve this problem, before updating the posterior estimate, the generalized Kalman consensus filter (GKCF) performs consensus on the prior information vectors and information matrices within the inclusive neighborhood of each node [
4,
16,
19]. As is analyzed in [
4], this procedure greatly improves the estimation performance in presence of naive nodes. GKCF updates current state based on consensus on prior estimates, but the current measurements are not considered for naive nodes. Each naive node only has access to measurements of the previous time instant existing in prior estimates. Therefore, there is a delay for naive nodes to access current measurements. On the contrary, consensus on measurements algorithm (CM) performs consensus on measurements [
5,
25,
26,
27], which can achieve the centralized performance with infinite consensus iterations. However, the stability is not guaranteed unless the number of consensus iterations is large enough [
26]. Consensus on information algorithm (CI) performs consensus on both prior estimates and measurements [
26,
27,
28], which can be viewed as a generalization of the covariance intersection fusion rule to multiple iterations [
29]. CI guarantees stability for any number of consensus iterations, but its estimation confidence can be degraded as a conservative rule is adopted by assuming that the correlation between estimates from different nodes are completely unknown [
26,
28,
30].
With more consensus iterations carried out, estimates from different nodes achieve a reasonable consensus. Therefore, each node has almost completely redundant or same prior information, and hence the prior estimation errors between nodes are highly correlated. In this situation, the algorithms such as KCF, GCKF, or CM, which do not take the cross-covariance into account, are sub-optimal [
16,
17]. Note that the redundant information only exists in the prior estimates, which come from the converged results in the previous time instant. Using this property, the information weighted consensus filter (ICF) [
18,
20,
21] divides the prior information of each node by
${N}_{s}$ where
${N}_{s}$ is the total number of nodes in the network. If each node can interact with its neighbors for infinite times, ICF will achieve the optimal estimation performance as the centralized Kalman consensus filter (CKF). In addition, ICF performs better than KCF, GKCF, CI, and CM under the same consensus iterations, which has been validated in [
16,
22,
26]. As is pointed out in [
26,
30,
31], the correction step by multiplying
${N}_{s}$ may cause an overestimation of the measurement innovation for some nodes, which is often the case in sparse sensor networks. As a consequence, the estimates of some nodes may be too optimistic such that the estimation consistency will be lost, which should be avoided in recursive estimation. To address this problem, HCMCI algorithm combines the positive features of both CM and CI is proposed. It should be noted that HCMCI represents a family of different distributed algorithms dependent on the selection of scalar weights. Both CI and ICF are special cases of HCMCI. To preserve consistency of local filters as well as improve the estimation performance, the so-called normalization factor is introduced. If the network topology is fixed, the normalization factor can be computed offline to save bandwidth. In [
32], a novel distributed set-theoretic information flooding (DSIF) protocol is proposed. The DSIF protocol benefits from avoiding the reuse of information and offering the highest converging efficiency for network consensus, but it suffers from growing requirements of node-storage, more communication iterations, and higher communication load.
However, it takes sufficient consensus iterations for the algorithms discussed above to achieve an expected estimation performance. In practical applications, only a limited number of consensus iterations is allowed, and thus the performance of the afore-mentioned algorithms is corrupted. In addition, the estimation performance of the afore-mentioned algorithms depends closely on the selection of consensus weights. Inappropriate consensus weights may cause the algorithms to diverge or require more iterations to achieve consensus on the local estimates [
9]. It is a common way to set the weights as a constant value as discussed in [
6], which is an intuitive choice to maintain the stability of the error dynamics. However, the constant value needs the knowledge of maximum degree across the entire sensor network. Even the maximum degree is available, it remains a problem how to determine a proper constant weight to achieve the best performance while preserving the property of consistency. In addition, the initial consensus terms determined in ICF require the knowledge of the total number of nodes in the network. The global parameters, such as the maximum degree or the total number of nodes, may vary over time when the communication topology is changed, some new nodes are joined, or some existing nodes fail to communicate with others. Without the accurate knowledge of these global parameters, each node would either overestimate or underestimate the state of interest.
To deal with the problems analyzed above, a novel distributed hybrid information weighted consensus filter (DHIWCF) is proposed in this paper. Firstly, different from the previous work [
4,
5,
6,
16,
18,
22], each node assigns consensus weights to its neighbors based on their local degrees, which is fully distributed with no requirement for any knowledge of the global parameters. Secondly, the prior estimate information and measurement information at current time instant within the inclusive neighborhood are, respectively, combined together to form the local generalized prior estimate equation and the local generalized measurement equation. Then, a distributed local MAP estimator is derived with some reasonable approximations of the error covariance matrices, which achieves higher accuracy than the approaches introduce in [
4,
5,
6,
11,
16,
18,
19,
25,
26,
27,
28]. Finally, the average consensus protocol with the aforementioned consensus weights is incorporated into the estimation framework, and the proposed DHIWCF is obtained. In addition, the theoretical analysis on consistency of the local estimates, stability and convergence of the estimator is performed. The comparative experiments on three different target tracking scenarios validate the effectiveness and feasibility of the proposed DHIWCF. Even with a single consensus iteration, the DHIWCF is still able to achieve acceptable estimation performance.
The remainder of this paper is organized as follows.
Section 2 formulates the problem of distributed state estimation in sensor networks. The distributed local MAP estimator is derived in
Section 3.
Section 4 presents distributed hybrid information weighted consensus filter. The theoretical analysis on the consistency of estimates, stability and convergence of the estimator is provided in
Section 5. The experimental results and analysis are considered in
Section 6. The concluded remarks are given in
Section 7.
Notation: ${\mathbb{R}}^{n}$ denotes the n-dimensional Euclidean space. $\Vert \text{}\cdot \text{}\Vert $ is the Euclidean norm in ${\mathbb{R}}^{n}$. For arbitrary matrix $\mathit{A}$, ${\mathit{A}}^{-1}$ and ${\mathit{A}}^{\mathrm{T}}$ are respectively its inverse and transpose. $\mathit{A}>0$ means $\mathit{A}$ is positive definite, and $\mathrm{tr}\left\{\mathit{A}\right\}$ is the shorthand for the trace of $\mathit{A}$. $\mathrm{diag}\left({\mathit{B}}_{1},{\mathit{B}}_{2},\dots ,{\mathit{B}}_{n}\right)$ denotes a block diagonal matrix with its main diagonal block being ${\mathit{B}}_{1},{\mathit{B}}_{2},\dots ,{\mathit{B}}_{n}$. ${\mathit{I}}_{n}$ represents the $n\times n$ identity matrix. For a set $\mathit{C}$, $\left|\mathit{C}\right|$ means the cardinality of $\mathit{C}$. $\mathbb{E}\{\cdot \}$ is the expectation operator.
3. Distributed Local MAP Estimation
This section starts with the centralized MAP estimator. Then we formulate the local generalized prior estimate equation based on prior estimates from the inclusive neighbors and the local generalized measurement equation based on the current measurements from the inclusive neighbors. By maximizing the local posterior probability, the local MAP estimator is derived. To implement the estimation steps in a distributed manner, approximation of the error cross covariance is required. Two special cases, where the prior errors from neighboring nodes are uncorrelated or completely identical, are considered here. The practical importance of such an approximation can be seen from the numerical examples in
Section 6, which indicate that the proposed DHIWCF is effective even if the assumed cases are not fulfilled.
3.1. Global MAP Estimator
Assume
${\mathit{z}}_{k}={\left[{\mathit{z}}_{1,\text{}k}^{\mathrm{T}},{\mathit{z}}_{2,\text{}k}^{\mathrm{T}},\dots ,{\mathit{z}}_{{N}_{s},\text{}k}^{\mathrm{T}}\right]}^{\mathrm{T}}$ represents the collective measurements of the entire sensor network at time instant
$k$. The stacked measurement matrix of all the nodes is denoted as
${\mathit{H}}_{k}={\left[{\mathit{H}}_{1,\text{}k}^{\mathrm{T}},{\mathit{H}}_{2,\text{}k}^{\mathrm{T}},\dots ,{\mathit{H}}_{{N}_{s},\text{}k}^{\mathrm{T}}\right]}^{\mathrm{T}}$. The stacked measurement noise is
${\mathit{v}}_{k}={\left[{\mathit{v}}_{1,\text{}k}^{\mathrm{T}},{\mathit{v}}_{2,\text{}k}^{\mathrm{T}},\dots ,{\mathit{v}}_{{N}_{s},\text{}k}^{\mathrm{T}}\right]}^{\mathrm{T}}$ with block diagonal covariance matrix
${\mathit{R}}_{k}=\mathrm{blkdiag}\left({\mathit{R}}_{1,k},{\mathit{R}}_{2,k},\dots ,{\mathit{R}}_{{N}_{s},k}\right)$. Then the global measurement model can be formulated as
Suppose the centralized prior estimate is
${\widehat{\mathit{x}}}_{k|k-1}^{c}$. The corresponding prior estimation error is
${\mathit{e}}_{k|k-1}^{c}={\widehat{\mathit{x}}}_{k|k-1}^{c}-{\mathit{x}}_{k}$ with covariance matrix
${\mathit{P}}_{k|k-1}^{c}=\mathbb{E}\left\{{\mathit{e}}_{k|k-1}^{c}{\left({\mathit{e}}_{k|k-1}^{c}\right)}^{\mathrm{T}}\right\}$. Let
${\widehat{\mathit{x}}}_{k|k}^{\mathrm{MAP}}$ be the maximum a posterior (MAP) estimate, we have
where
$p\left({\mathit{z}}_{k}|{\mathit{Z}}_{k-1}\right)={\displaystyle \int p\left({\mathit{z}}_{k}|{\mathit{x}}_{k}\right)p\left({\mathit{x}}_{k}|{\mathit{Z}}_{k-1}\right)\mathrm{d}{\mathit{x}}_{k}}$ is a normalization constant. Since the process noise and measurement noise are both Gaussian, then the conditional PDF
$p\left({\mathit{z}}_{k}|{\mathit{x}}_{k}\right)$ and
$p\left({\mathit{x}}_{k}|{\mathit{Z}}_{k-1}\right)$ are also Gaussian. The explicit form of the prior PDF
$p\left({\mathit{x}}_{k}|{\mathit{Z}}_{k-1}\right)$ and the likelihood PDF
$p\left({\mathit{z}}_{k}|{\mathit{x}}_{k}\right)$ is formed as
where
$p\left({\mathit{z}}_{k}|{\mathit{Z}}_{k-1}\right)={\displaystyle \int p\left({\mathit{z}}_{k}|{\mathit{x}}_{k}\right)p\left({\mathit{x}}_{k}|{\mathit{Z}}_{k-1}\right)\mathrm{d}{\mathit{x}}_{k}}$ is a normalization constant. Since the process noise and measurement noise are both Gaussian, then the conditional PDF
$p\left({\mathit{z}}_{k}|{\mathit{x}}_{k}\right)$ and
$p\left({\mathit{x}}_{k}|{\mathit{Z}}_{k-1}\right)$ are also Gaussian. The explicit form of the prior PDF
$p\left({\mathit{x}}_{k}|{\mathit{Z}}_{k-1}\right)$ and the likelihood PDF
$p\left({\mathit{z}}_{k}|{\mathit{x}}_{k}\right)$ is formed as
where
${\Vert \mathit{x}\Vert}_{\mathit{A}}^{2}={\mathit{x}}^{\mathrm{T}}\mathit{A}\mathit{x}$. Based on Gaussian product in the numerator, the criterion in (7) can be reformulated by minimizing the following cost function.
Here, the cost function in (11) is strictly convex on
${\mathit{x}}_{k}$ and hence the optimal
${\widehat{\mathit{x}}}_{k|k}^{\mathrm{MAP}}$ is available.
The corresponding posterior error covariance is
The equivalent information form of the estimate in (12) and (13) can be rewritten as
3.2. Local MAP Estimation
Assume that each node, for instance, node
$i$, is able to receive its neighbor’s prior local estimate
${\widehat{\mathit{x}}}_{j,k|k-1}$ and the corresponding covariance
${\mathit{P}}_{j,k|k-1}^{}$, as well as its neighbor’s local measurement
${\mathit{z}}_{j,k}$ and the corresponding noise covariance
${\mathit{R}}_{j,k}$ by local communication. The local generalized prior estimate, denoted by
${{\widehat{\mathit{x}}}^{\prime}}_{i,k|k-1}^{}$, is defined as
where
${j}_{h}\text{}\in {\mathcal{N}}_{i}\left(h=1,2,\dots ,{d}_{i}\right)$ denotes the index of node
$i$’s neighbors. Let
${\mathit{\eta}}_{i,k|k-1}^{}={\widehat{\mathit{x}}}_{i,k|k-1}^{}-{\mathit{x}}_{k}^{}$ be the prior error of node
$i$. The local collective prior error of node
$i$ with respect to its inclusive neighbors can be formulated as
${{\mathit{\eta}}^{\prime}}_{i,k|k-1}^{}={\left[{\mathit{\eta}}_{i,k|k-1}^{\mathrm{T}},{\mathit{\eta}}_{{j}_{1},k|k-1}^{\mathrm{T}},\dots ,{\mathit{\eta}}_{{j}_{{d}_{i}},k|k-1}^{\mathrm{T}}\right]}^{\mathrm{T}}$. The local generalized prior estimate can be expressed by
where
${\mathbf{\mathscr{H}}}_{\mathit{I}}={[{\mathbf{I}}_{p},{\mathbf{I}}_{p},\dots ,{\mathbf{I}}_{p}]}^{\mathrm{T}}$ is the matrix stacked by
${d}_{i}+1$ identity matrices.
${\mathit{x}}_{k}^{}$ is the true state at time instant
$k$. The local collective prior error covariance of node
$i$ can be written as
Here, the block matrix
${{\mathit{P}}^{\prime}}_{i,k|k-1}^{}\in {\mathbb{R}}^{\left(1+{d}_{i}\right){n}_{\mathit{x}}\times \left(1+{d}_{i}\right){n}_{\mathit{x}}}$.
Similarly, the local generalized measurement of node
$i$ with regard to its inclusive neighbors can be formulated as
Here, ${{\mathit{z}}^{\prime}}_{i,\text{}k}={\left[{\mathit{z}}_{i,k}^{\mathrm{T}},{\mathit{z}}_{{j}_{1},k}^{\mathrm{T}},\dots ,{\mathit{z}}_{{j}_{{d}_{i}},k}^{\mathrm{T}}\right]}^{\mathrm{T}}$ is the local generalized measurement. ${{\mathit{H}}^{\prime}}_{i,k}={\left[{\mathit{H}}_{i,\text{}k}^{\mathrm{T}},{\mathit{H}}_{{j}_{1},\text{}k}^{\mathrm{T}},\dots ,{\mathit{H}}_{{j}_{{d}_{i}},\text{}k}^{\mathrm{T}}\right]}^{\mathrm{T}}$ is the local generalized measurement matrix. ${{\mathit{v}}^{\prime}}_{i,k}={\left[{\mathit{v}}_{i,\text{}k}^{\mathrm{T}},{\mathit{v}}_{{j}_{1},\text{}k}^{\mathrm{T}},\dots ,{\mathit{v}}_{{j}_{{d}_{i}},\text{}k}^{\mathrm{T}}\right]}^{\mathrm{T}}$ denotes the local generalized measurement noise with covariance matrix ${{\mathit{R}}^{\prime}}_{i,k}=\mathrm{blkdiag}\left({\mathit{R}}_{i,k},{\mathit{R}}_{{j}_{1},k},\dots ,{\mathit{R}}_{{j}_{{d}_{i}},k}\right)$.
Combining (17) and (19) together, one has
where the error covariance
Here the operator
$\mathrm{blkdiag}\left(\text{}\cdot \text{}\right)$ denotes the block diagonal matrix.
According to the derivation of the global maximum a posterior estimator described in
Section 3.1, the updated local information matrix can be computed by
Similarly, the updated local information vector is
Here,
${\left[{\left({{\mathit{P}}^{\prime}}_{i,k|k-1}^{}\right)}^{-1}\right]}_{r,s}$ denotes the
$\left(r,s\right)$-th block matrix of
${\left({{\mathit{P}}^{\prime}}_{i,k|k-1}^{}\right)}^{-1}$. Similarly,
${\left[{{\widehat{\mathit{x}}}^{\prime}}_{i,k|k-1}^{}\right]}_{s}$ denotes the
$s$-th block vector of
${{\widehat{\mathit{x}}}^{\prime}}_{i,k|k-1}^{}$.
3.3. Approximation of ${\left({{\mathit{P}}^{\prime}}_{i,k|k-1}^{}\right)}^{-1}$
It is shown in (22) and (23) that the key to acquire the local posterior estimate is to compute the inverse of the local collective prior error covariance, that is,
${\left({{\mathit{P}}^{\prime}}_{i,k|k-1}^{}\right)}^{-1}$. However, as is shown in (18), the computation of
${\left({{\mathit{P}}^{\prime}}_{i,k|k-1}^{}\right)}^{-1}$ requires the knowledge of cross-covariance between neighbors of node
$i$. As is shown in [
6], to compute the cross-covariance matrix
${\mathit{P}}_{i{j}_{h},k|k-1}^{}$, the information of the neighbors of node
${j}_{h}\text{}$ is also required. Therefore, it is not practical to directly compute
${\left({{\mathit{P}}^{\prime}}_{i,k|k-1}^{}\right)}^{-1}$ due to the fact that large amounts of communication among neighboring nodes are required, which may cause tremendous burden on computation and communication for the networked system. Although some work has been done in [
35,
36] to incorporate cross-covariance information into the estimation framework, no technique for computing the required terms are offered and predefined values are used instead [
4].
Therefore, an approximation of ${{\mathit{P}}^{\prime}}_{i,k|k-1}^{}$ in a distributed manner is necessary. In the following derivation, two special cases are discussed. The first case is that the prior estimates from different nodes are completely uncorrelated with each other. This is true at the beginning of the estimation procedure when the prior information are initialized with random quantities. The second case is for converged priors, which is critical for the reason that with sufficient consensus iterations, the prior estimates from all nodes will converge to the identical value.
3.3.1. Case 1: Uncorrelated Priors
In this case, the prior errors from different nodes are assumed to be uncorrelated with each other, i.e.,
$\mathbb{E}\left\{{\mathit{\eta}}_{i,k|k-1}^{}{\mathit{\eta}}_{{j}_{h},k|k-1}^{\mathrm{T}}\right\}=\mathbf{0}$. Hence,
${{\mathit{P}}^{\prime}}_{i,k|k-1}^{}$ in (18) turns into a block diagonal matrix
${{\mathit{P}}^{\prime}}_{i,k|k-1}^{}=\mathrm{blkdiag}\left({\mathit{P}}_{i,k|k-1}^{},{\mathit{P}}_{{j}_{1},k|k-1}^{},\dots ,{\mathit{P}}_{{j}_{{d}_{i}},k|k-1}^{}\right)$. The local posterior estimate in (22) and (23) can be approximated as
Note that after enough consensus iterations, the prior estimates of each node in the network asymptotically converges to the centralized result, i.e.,
${\mathit{Y}}_{i,k|k-1}^{}={\mathit{Y}}_{c,k|k-1}^{}$ and
${\widehat{\mathit{y}}}_{i,k|k-1}^{}={\widehat{\mathit{y}}}_{c,k|k-1}^{}$. In such a case, the local prior information matrix in (25) turns into
${\sum}_{j\in {\mathcal{J}}_{i}}{\mathit{Y}}_{j,k|k-1}^{}}=\left(1+{d}_{i}\right){\mathit{Y}}_{c,k|k-1}^{$. However, after convergence, the total amount of prior information in the network is
${\mathit{Y}}_{c,k|k-1}^{}$. That is, the local prior information matrix in the inclusive neighborhood is overestimated by a factor
$\left(1+{d}_{i}\right)$. Therefore, the approximation of
${{\mathit{P}}^{\prime}}_{i,k|k-1}^{}$ should be modified by multiplying a factor
$\left(1+{d}_{i}\right)$, which is
${{\mathit{P}}^{\prime}}_{i,k|k-1}^{}=\left(1+{d}_{i}\right)\mathrm{blkdiag}\left({\mathit{P}}_{i,k|k-1}^{},{\mathit{P}}_{{j}_{1},k|k-1}^{},\dots ,{\mathit{P}}_{{j}_{{d}_{i}},k|k-1}^{}\right)$ to avoid underestimation of the prior covariance. Hence, the results in (24) and (25) should be modified as
3.3.2. Case 2: Converged Priors
When the prior estimate of each node converges to the centralized result, one has
Note that for converged priors,
${\mathit{Y}}_{j,k|k-1}^{}={\mathit{Y}}_{c,k|k-1}^{},\text{}j\in {\mathcal{J}}_{i}$. Substituting this fact into (28), there is
Therefore, the estimated results in (22) and (23) can be transformed into the weighted summation of the prior information and current measurement innovations, which are the same forms as the results shown in (26) and (27).
Remark 4. It should be noted that the assumed cases are not always satisfied in realistic applications, but it is still of great significance for distributed filtering algorithms. The effectiveness and feasibility of such an approximation is evaluated by numerical examples in Section 6. 5. Performance Analysis
5.1. Consistency of Estimates
One of the most fundamental but important properties of a recursive filtering algorithm is that the estimated error statistics should be consistent with the true estimation errors. The approximated error covariance of an inconsistent filtering algorithm is too small or optimistic, which does not really indicate the uncertainty of the estimate and may result in divergence since subsequent measurements in this case are prone to be neglected [
28].
Definition 2 [
28,
30,
37,
38]
. Consider a random vector $\mathit{x}$. Let $\widehat{\mathit{x}}$ and $\mathit{P}$ be, respectively, the estimate of $\mathit{x}$ and the estimate of the corresponding error covariance. Then the pair $\left(\widehat{\mathit{x}},\mathit{P}\right)$ is said to be consistent if It is shown in (38) that consistency requires that the true error covariance should be upper bounded (in the positive definite sense) by the approximated error covariance $\mathit{P}$. In the distributed estimation paradigm, due to the unaware reuse of the redundant data in the consensus iteration and the possible correlation between measurements from different nodes, the filter may suffer from inconsistency and divergence. In such a case, preservation of consistency is even much more important.
For convenience, consider the information pair
$(\widehat{\mathit{y}},\mathit{Y})$, where
$\widehat{\mathit{y}}={\mathit{P}}^{-1}\widehat{x}$ and
$\mathit{Y}={\mathit{P}}^{-1}\widehat{x}$. The consistency defined by (38) can be rewritten as
Assumption 3. The initialized estimate of each node, represented by $({\widehat{\mathit{x}}}_{i,0|0}^{},{\mathit{P}}_{i,0|0}^{}),i\in \mathcal{S}$, is consistent. Equivalently, inequality ${\mathit{P}}_{i,0|0}^{}\ge \mathbb{E}\left\{\left({\widehat{\mathit{x}}}_{i,0|0}^{}-{\mathit{x}}_{0}\right){\left({\widehat{\mathit{x}}}_{i,0|0}^{}-{\mathit{x}}_{0}\right)}^{\mathrm{T}}\right\}$ holds for $i\in \mathcal{S}$.
Remark 5. In general, Assumption 3 can be easily satisfied. The initial information on the state vector can be acquired in an off-line fashion before the fusion process. In the worst case where no prior information is available, each node can simply set the initialized information matrix as ${\mathit{P}}_{i,0|0}^{-1}=\mathbf{0}$, which implies infinite estimate uncertainty in each node at the beginning so that Assumption 3 is fulfilled.
Assumption 4. The system matrix ${\mathit{F}}_{k}$ is invertible.
Lemma 1 [
28]
. Under Assumption 4, if two positive semidefinite matrices ${\mathit{Y}}_{1}$ and ${\mathit{Y}}_{2}$ satisfy ${\mathit{Y}}_{1}\le {\mathit{Y}}_{2}$, then $0\le {\Psi}_{k}\left({\mathit{Y}}_{1}\right)\le {\Psi}_{k}\left({\mathit{Y}}_{2}\right)$. In other words, the function ${\Psi}_{k}\left(\text{}\cdot \text{}\right)$ is monotonically nondecreasing for any $k\ge 0$. Theorem 1. Let Assumptions 1, 2, and 3 hold. Then, for each time instant $k$ and each node $i\in \mathcal{S}$, the information pair $({\widehat{\mathit{y}}}_{i,k|k},{\mathit{Y}}_{i,k|k})$ of the DHIWCF is consistent in thatwith${\widehat{\mathit{x}}}_{i,k|k}^{}={\mathit{Y}}_{i,k|k}^{-1}{\widehat{\mathit{y}}}_{i,\text{}k|k}^{}$.
Proof. An inductive method is utilized here to prove this theorem. It is supposed that, at time instant
$k-1$
for any
$i\in \mathcal{S}$. For brevity, the predicted information matrix in (31) can be rewritten as
On the basis of Lemma 1, it is immediate to see
According to (26) and (27), the local estimation error is
According to the consistency property of covariance intersection [
29,
38], it holds that
Then, exploiting (47) and (43) in (45), the following result is obtained.
Since the information pair
$({\widehat{\mathit{x}}}_{i,k|k}^{l+1},{\mathit{Y}}_{i,k|k}^{l+1})$ is computed based on the previous information pair
$({\widehat{\mathit{x}}}_{i,k|k}^{l},{\mathit{Y}}_{i,k|k}^{l})$ by (3), and the covariance intersection involved in (3) preserves the consistency of estimates [
29,
37,
38,
39], it can be concluded that
${\left(\mathbb{E}\left\{\left({\widehat{\mathit{x}}}_{i,k|k}^{l}-{\mathit{x}}_{k}\right){\left({\widehat{\mathit{x}}}_{i,k|k}^{l}-{\mathit{x}}_{k}\right)}^{\mathrm{T}}\right\}\right)}^{-1}\ge {\mathit{Y}}_{i,k|k}^{l}$ indicates
${\left(\mathbb{E}\left\{\left({\widehat{\mathit{x}}}_{i,k|k}^{l+1}-{\mathit{x}}_{k}\right){\left({\widehat{\mathit{x}}}_{i,k|k}^{l+1}-{\mathit{x}}_{k}\right)}^{\mathrm{T}}\right\}\right)}^{-1}\ge {\mathit{Y}}_{i,k|k}^{l+1}$ for any
$l=1,\dots ,L$. In other words, if the estimate obtained with
l consensus iterations is consistent, the estimate obtained with
l + 1 consensus iterations is also consistent. Therefore, it is straightforward to conclude that (40) holds with
${\widehat{\mathit{x}}}_{i,k|k}={\widehat{\mathit{x}}}_{i,k|k}^{L}$ and
${\mathit{Y}}_{i,k|k}={\mathit{Y}}_{i,k|k}^{L}$. The proof is concluded since the initial estimate
${\widehat{\mathit{x}}}_{i,0|0},\forall i\in \mathcal{S}$ is consistent. □
5.2. Boundedness of Error Covariances
According to the consistency of the proposed DHIWCF in Theorem 1, it is sufficient to prove that ${\mathbf{Y}}_{i,k|k}^{}$ is lower bounded by a certain positive matrix (or equivalently, to prove ${P}_{i,k|k}^{}={\mathbf{Y}}_{i,k|k}^{-1}$ is upper bounded by some constant matrix) for the proof of the boundedness of the error covariance $\mathbb{E}\left\{\left({\widehat{\mathit{x}}}_{i,k|k}^{}-{\mathit{x}}_{k}\right){\left({\widehat{\mathit{x}}}_{i,k|k}^{}-{\mathit{x}}_{k}\right)}^{\mathrm{T}}\right\}$. To derive the bounds for the information matrix ${\mathit{Y}}_{i,k|k}^{}$, The following assumptions are required.
Assumption 5. The system is collectively observable. That is, the pair $\left({\mathit{F}}_{k},{\mathit{H}}_{k}\right)$ is observable where ${\mathit{H}}_{k}=\mathrm{col}\left({\mathit{H}}_{i,\text{}k},\text{}i\in \mathcal{S}\right)$.
Let $\mathbf{\Pi}$ be the consensus matrix, whose elements are the consensus weights ${\pi}_{i,j}^{}$ for any $i,j\in \mathcal{S}$. Further, let ${\pi}_{i,j}^{L}$ be the $\left(i,j\right)$-th element of ${\mathit{\Pi}}^{L}$, which is the $L$-th power of $\mathit{\Pi}$.
Assumption 6. The consensus matrix $\mathit{\Pi}$ is row stochastic and primitive.
Assumption 7. There exist real numbers $\underset{\_}{f},\overline{f},\underset{\_}{h},\overline{h}\ne 0$ and positive real numbers $\overline{p}>\underset{\_}{p}>0$, $\overline{q}>\underset{\_}{q}>0$, such that the following bounds are fulfilled for each $k\ge 0,\text{}i\in \mathcal{S}$. Lemma 2 [
28]
. Under Assumptions 4 and 5, and the proposed DHIWCF algorithm, if there exists a positive semidefinite matrix $\stackrel{\u2323}{\mathit{Y}}$ such that ${\mathit{Y}}_{i,k|k}^{}\le \stackrel{\u2323}{\mathit{Y}},\text{}\forall k\ge 0,\text{}i\in \mathcal{S}$, then there always exists a strictly positive constant $0<\alpha <1$ such that By virtue of Lemma 2, Theorem 2 which depicts the boundedness of error covariances is presented below.
Theorem 2. Let Assumptions 4–7 hold, there exist positive definite matrices $\underset{\_}{\mathit{\Omega}}$ and $\overline{\mathit{\Omega}}$ such thatwhere ${\mathit{Y}}_{i,\text{}k|k}^{}$ is the information matrix given by the proposed DHIWCF. Proof. For simplicity, the proof is concluded for the case
$L=1$. The generalization for
$L>1$ can be directly derived in a similar way. According to the proposed DHIWCF, the information matrix for node
$i$ at time instant
$k$ can be written as
In view of Assumption 6, 7 and fact that
${\mathit{Y}}_{r,k|k-1}^{}\le {\mathit{Q}}_{k-1}^{-1}$ by (31), one can get
Hence, the upper bound is achieved. Next a lower bound will be guaranteed under Assumption 5.
According to Lemma 2 and Assumption 7 and (31), (53), it follows from (52) that
where
$\alpha $ is a positive scalar with
$0<\alpha <1$. By recursively exploiting (52) and (54) for a certain number (denoted by
$\overline{k}$) of times, there is
where
${\pi}_{i,j}^{\tau}$ is the
$\left(i,j\right)$-th element of
${\mathit{\Pi}}^{\tau}$.
$\mathit{\Xi}$ is a matrix with elements
Note that the matrix
$\mathit{\Xi}$ is constructed based on the network topology and is naturally stochastic. According to [
40,
41], as long as the undirected network is connected, similar to the definition of
$\mathit{\Pi}$,
$\mathit{\Xi}$ is primitive. Therefore, there exist strictly positive integers
$m$ and
$n$ such that all the elements of
${\mathit{\Pi}}^{s}$ and
${\mathit{\Xi}}^{t}$ are positive for
$s\ge m,\text{}t\ge n$. Let us define
It should be noted that, under Assumption 5, ${\mathit{\Omega}}_{1}$ is definite positive for $\overline{k}\ge \mathrm{max}\left(m,n+1\right)$. Therefore, for $k\ge \overline{k}$, ${\mathit{Y}}_{i,k|k}^{}\ge {\mathit{\Omega}}_{1}>0$. Since $\overline{k}$ is finite, for $0\le k\le \overline{k}-1$, there exists a constant positive definite matrix ${\mathit{\Omega}}_{2}$ such that ${\mathit{Y}}_{i,k|k}^{}\ge {\mathit{\Omega}}_{2}>0$. Hence, there exists a positive definite matrix $\underset{\_}{\mathit{\Omega}}$ such that $0<\underset{\_}{\mathit{\Omega}}\le {\mathit{Y}}_{i,\text{}k|k}^{}$. The proof is now complete. □
Remark 6. The result shown in Theorem 2 is only dependent on collective observability. This is distinct from some algorithms that require some sort of local observability or detectability condition [5,6,8,11,25], which poses a great challenge to the sensing abilities of sensors and restricts the scope of application. 5.3. Convergence of Estimation Errors
In line with the boundedness of ${\mathbf{Y}}_{i,k|k}^{}$ proven in Theorem 2, the convergence of local estimation errors obtained by the proposed DHIWCF is analyzed in this section. To facilitate the analysis, the following preliminary lemmas are required.
Lemma 3 [
26,
28,
31]
. Given an integer $N\ge 2$, $N$ positive definite matrices ${\mathit{M}}_{1},\dots ,{\mathit{M}}_{N}$ and $N$ vectors ${\mathit{v}}_{1},\dots ,{\mathit{v}}_{N}$, the following inequality holds Lemma 4 [
26,
28]
. Under Assumptions 4 and 5, and the proposed DHIWCF algorithm, if there exists a positive semidefinite matrix $\tilde{\mathit{Y}}$ such that ${\mathit{Y}}_{i,k|k}^{}\ge \tilde{\mathit{Y}},\text{}\forall k\ge 0,\text{}i\in \mathcal{S}$, then there always exists a strictly positive scalar $0<\beta <1$ such that For the sake of simplicity, let us denote the prediction and estimation error at node $i$ by ${\tilde{\mathit{x}}}_{i,k|k-1}^{}={\widehat{\mathit{x}}}_{i,k|k-1}^{}-{\mathit{x}}_{k}$ and ${\tilde{\mathit{x}}}_{i,k|k}^{}={\widehat{\mathit{x}}}_{i,k|k}^{}-{\mathit{x}}_{k}$, respectively. The collective forms are, respectively, ${\tilde{\mathit{x}}}_{k|k-1}^{}=\mathrm{col}\left({\tilde{\mathit{x}}}_{1,k|k-1}^{},\dots ,{\tilde{\mathit{x}}}_{{N}_{s},k|k-1}^{}\right)$ and ${\tilde{\mathit{x}}}_{k|k}^{}=\mathrm{col}\left({\tilde{\mathit{x}}}_{1,k|k}^{},\dots ,{\tilde{\mathit{x}}}_{{N}_{s},k|k}^{}\right)$.
Theorem 3. Under Assumptions 4–6, the proposed DHIWCF algorithm yields an asymptotic estimate in each node of the network in that Proof. Under Assumptions 4–6, Theorem 2 holds. Therefore,
${\mathit{Y}}_{i,k|k}^{}$ is uniformly lower and upper bounded. Let us define the following candidate Lyapunov function
By virtue of Lemma 2, it can be concluded that there exists a positive real number
$0<\tilde{\beta}<1$ such that
Since
$\mathbb{E}\left\{{\tilde{\mathit{x}}}_{i,k+1|k}^{}\right\}=\mathbb{E}\left\{{\mathit{F}}_{k}{\widehat{\mathit{x}}}_{i,k|k}^{}-\left({\mathit{F}}_{k}{\mathit{x}}_{k}+{\mathit{w}}_{k}\right)\right\}={\mathit{F}}_{k}\mathbb{E}\left\{{\tilde{\mathit{x}}}_{i,k|k}^{}\right\}$, one can obtain
Here, pre-multiplying (65) by
${\mathit{Y}}_{i,k|k}^{-1}$ and post-multiplying it by
${\mathit{x}}_{k}$ yields
In a similar way, pre-multiplying (66) by
${\mathit{Y}}_{i,k|k}^{-1}$ yields
According to (36), there is
Since
$\mathrm{E}\left\{{\mathit{v}}_{r,\text{}k}\right\}=\mathbf{0}$, one can get
Substituting (71) into (64) yields
Applying the fact that
${\mathit{Y}}_{i,\text{}k|k}^{}\ge {\displaystyle {\sum}_{j\in \mathcal{S}}{\displaystyle {\sum}_{r\in {\mathcal{J}}_{j}}{\pi}_{i,j}^{L}{\left(1+{d}_{j}\right)}^{-1}{\mathit{Y}}_{r,k|k-1}^{}}}$ and Lemma 3 to the right hand side of (72), one can obtain that
Writing (73) for
$i=1,2,\dots ,{N}_{s}$ in a collective form, it turns out that
where
and
$\mathit{\Xi}$ is a matrix with elements satisfying
Since the consensus matrix $\mathit{\Pi}$ and the constructed matrix $\Xi $ are both row stochastic, thus their spectral radiuses are both 1. As a consequence, for $0<\tilde{\beta}<1$, the elements of vector ${V}_{k+1}\left(\mathbb{E}\left\{{\tilde{\mathit{x}}}_{k+1|k}^{}\right\}\right)$ vanishes as $k$ tends to infinity in that $\underset{k\to +\infty}{\mathrm{lim}}\mathbb{E}\left\{{\tilde{\mathit{x}}}_{k+1|k}^{}\right\}=\mathbf{0}$. Due to the equation $\mathbb{E}\left\{{\tilde{\mathit{x}}}_{k+1|k}^{}\right\}={\mathit{F}}_{k}\mathbb{E}\left\{{\tilde{\mathit{x}}}_{i,k|k}^{}\right\}$ and Assumption 4, it is straightforward to conclude that $\underset{k\to +\infty}{\mathrm{lim}}\mathbb{E}\left\{{\widehat{\mathit{x}}}_{i,k|k}-{\mathit{x}}_{k}\right\}=\mathbf{0}$ for any $i\in \mathcal{S}$. □
Remark 7. The Lyapunov function defined in (61) plays a crucial role in the convergence proof of the proposed algorithm, which can be easily extended to stability analysis of Kalman-like consensus filters in other scenarios. The reason for the non-singularity requirements of ${\mathit{F}}_{k}$ in Theorem 3 is that the proof of the Lyapunov method depends on Lemma 4, the establishment of which needs the invertibility of ${\mathit{F}}_{k}$.