Joint Detection and Communication over Type-Sensitive Networks

Shaska, Joni; Mitra, Urbashi

doi:10.3390/e25091313

Open AccessFeature PaperArticle

Joint Detection and Communication over Type-Sensitive Networks

by

Joni Shaska

^* and

Urbashi Mitra

Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, CA 90089, USA

^*

Author to whom correspondence should be addressed.

Entropy 2023, 25(9), 1313; https://doi.org/10.3390/e25091313

Submission received: 30 May 2023 / Revised: 7 August 2023 / Accepted: 24 August 2023 / Published: 8 September 2023

(This article belongs to the Collection Feature Papers in Information Theory)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Due to the difficulty of decentralized inference with conditional dependent observations, and motivated by large-scale heterogeneous networks, we formulate a framework for decentralized detection with coupled observations. Each agent has a state, and the empirical distribution of all agents’ states or the type of network dictates the individual agents’ behavior. In particular, agents’ observations depend on both the underlying hypothesis as well as the empirical distribution of the agents’ states. Hence, our framework captures a high degree of coupling, in that an individual agent’s behavior depends on both the underlying hypothesis and the behavior of all other agents in the network. Considering this framework, the method of types, and a series of equicontinuity arguments, we derive the error exponent for the case in which all agents are identical and show that this error exponent depends on only a single empirical distribution. The analysis is extended to the multi-class case, and numerical results with state-dependent agent signaling and state-dependent channels highlight the utility of the proposed framework for analysis of highly coupled environments.

Keywords:

heterogeneous networks; method of types; large-scale networks; information measures

1. Introduction

Decentralized detection is an important element in a wide range of modern applications, such as the Internet of Things [1], smart grids [2], cognitive radio [3], and millimeter-wave communications [4]. However, many classical results in decentralized detection assume that agents’ observations are independent, conditioned on the underlying hypothesis. This assumption fails to hold in many of these recent applications, such as human decision-making [5], sensor networks with correlated observations [6], and quorum sensing in microbial communities [7]. Unfortunately, the problem of decentralized detection with correlated observations is NP-Hard [8], and many of the classical results are not applicable in this case (for examples, see [9,10,11]). Recent work in decentralized detection has placed greater attention on the case of correlated observations [12,13,14,15]. Although recent advancements have been promising, the inherent difficulty of the problem has resulted in approximations and relaxations [13,15]. In this work, we build upon the state-dependent formulation introduced in [16] by allowing agents’ observations to depend on both the underlying hypothesis as well as the empirical distribution, or type, of their states. The notion of type has a rich history in information theory and statistics, being first introduced by Csiszar [17]. Today, the method of types has been further developed [18] and is used in a variety of fields, such as control [19], machine learning [20], statistics [21], and even DNA storage channels [22].

Conditionally correlated observations can be handled under specific signal models [15] and assumptions [12,16,23,24,25]. In particular, ref. [15] studied bandwidth-constrained detection under the Neyman–Pearson criterion and solved a relaxation of the problem. Several works [23,24,25] have studied the problem under communication constraints, with [23] showing that the network learns the hypothesis exponentially quickly under constrained [23] and randomized [24] communication. Moreocer, [25] developed a deep learning algorithm for real-time industry constraints. Other works have attempted to decouple agents’ observations via algorithms [13] and specific models [12,16,26]. In [12], a hidden variable was introduced that allows the observations of the agents to be independent, conditioned on the hidden variable, and it was proved that threshold-based decisions are optimal under certain model assumptions. Unfortunately, even if a problem of interest falls under this framework, the assumptions are rather strong and fail in a number of applications. In our prior work [16,26], we introduced a state variable for each agent and allowed the agents’ observations to be independent conditioned on both the hypothesis and the agent’s state. We proved similar results to those of [12,27] under much weaker conditions. However, the model proposed in [16,26] grants each agent its own individual state, whereas in [12] agents may share a common hidden variable.

The framework was extended in [16,26]; herein agents’ observations, depend on a common variable, i.e., the type of the agents’ states. In [16], it was assumed that agents know their individual state and that the fusion center knows the states of all agents. We strongly relax this assumption; agents do not know their state, and the fusion center only knows the empirical distribution of the agents’ states. Another key difference is that in [16,26] the state variable is sufficient to decouple agents’ observations, whereas in this work all agent states are necessary to decouple observations, allowing this formulation to handle stronger forms of coupling. The need for the empirical distribution calls for different analysis techniques from those in [12,16,26] via the method of types. We further introduce a communication link between the agents and the centralized decision-maker (called the fusion center) which is not present in [12,16,26].

Many works in decentralized detection include a communication link between the agents and the fusion center; the idea itself is not new [28,29,30]. However, in prior works the statistical properties of the communication channel were assumed to be independent of the network’s behavior. A contribution of our current work is that we allow the quality of the communication channel to vary with the network’s behavior. This is again accomplished by allowing the channel to vary with the type of the agents’ states. The concept of a channel with state-dependent noise has been previously considered in information theory [31], and is in use today [32,33,34]. However, most of the aforementioned works involving the notion of state have focused primarily on communication over channels with state, and have not examined joint detection and communication. While recent works on estimation exist, they were the context of estimating the channel state to improve communication performance [34,35,36]. Notably, signal-dependent noise [37] can be accommodated in our proposed model. In particular, these models are relevant to visible light communication [38], magnetic recording [39,40], and imaging applications [41,42].

As an example, we may consider the occurrence of such forms of coupling in microbial systems. Microbial communities synthesize signaling molecules [7]; when sensed in the environment, these can result in individual gene expressions that lead to new collective behaviors through a process called quorum sensing. Specifically, cell i only engages in quorum sensing when the received number of autoinducer molecules from the environment

A_{i}

exceeds a certain threshold

τ_{A}

. A common model involves assuming that

A_{i}

follows a Poisson distribution conditioned on the total number of synthases (synthases are enzymes within a cell that are responsible for the production of autoinducer molecules) in the community and the number of receptors in a cell i, provided as

R_{i}

[43]:

P (A_{i} = k | R_{i}, S_{1}, \dots; S_{n}) = \frac{{(λ R_{i} \sum_{j = 1}^{n} S_{j})}^{k} exp (- λ R_{i} \sum_{j = 1}^{n} S_{j})}{k!}

where

S_{i}

is the number of synthases present in cell i and

λ > 0

is a normalizing term. Hence, we can think of the number of synthases and receptors in cell i as being the state of cell i. Then, the observation of cell i depends on the states of all other cells through the summed total of synthases across the cells. This example illustrates the need for the current approach, as the models proposed in [12,16] cannot handle this form of coupling and do not lead to tractable asymptotic results.

In this work, we derive the error exponent as the network size grows. Assuming that all priors are known, the optimal asymptotic decay rate of the probability of error is provided by the Chernoff information [27,44,45], regardless of whether conditional independence holds. Using the Chernoff information, ref. [27] proved that identical rules are asymptotically optimal for identical agents, while [45] showed that identical binary quantizers are asymptotically optimal in power-constrained networks. The works in [27,45] both relied on conditional independence. A contribution of the present study is to remove the need for conditional independence through the development of a measure that is asymptotically equivalent to the Chernoff information and tractable in our scenario. The primary argument comes from the method of types, which, combined with a series of equicontinuity arguments, shows that asymptotic performance is dominated by a single distribution. Surprisingly, this dominating distribution is generally not the true distribution of the agents’ states.

Using the network type to decouple agents’ observations can be extended beyond pure decentralized detection. For instance, consensus algorithms used in blockchain applications often need to deal with faulty or nonconforming nodes [46]. Hence, it is possible to consider whether the node is conforming or not as the state and the total percentage of conforming nodes as the network type. Then, the problems of jointly estimating the network type (the consensus problem) and detecting the underlying hypothesis (the detection problem) can be considered. Much of the structure herein applies to such problems, as observations received by agents depend on the other agents’ states. Moreover, the hypothesis and network type are correlated; when more agents are faulty or nonconforming, an attack is more likely to be present.

Our contributions in this paper are as follows:

We formulate a framework for distributed inference in which the agents’ observations are correlated through both the hypothesis and the empirical distribution (or type) of the network state. This formulation captures a high level of coupling between agents.
We consider a distributed inference problem with a communication link between the agents and the fusion center, with the additional caveat that the noise over the link is dependent on the agents. Hence, our framework captures joint sensing with correlated observations as well as joint communications with correlated noise.
We derive expressions for the error exponent for a single class of agents, then extend our results to the case of heterogeneous groups of identical agents. In particular, assuming that identical agents use a common rule, the optimal error exponent depends only on the ratios of the groups, not on the actual size of the groups themselves. This allows a wide range of problems to be studied in which there are multiple classes of agents that interfere with each other.
We present a numerical example for a three-class case to highlight the utility of the proposed expression for the error exponent. In particular, we show how this expression can be used to optimize the ratios of heterogeneous groups in the presence of cross-class interference. This example further illustrates the fact that the true distribution may not dominate the asymptotics. The effect of the channel is observed as well.

Notation

Random variables are denoted by capital letters X and specific realizations are denoted as lowercase letters x. Random vectors are denoted as boldface capital letters

X

and specific realizations are denoted with lowercase boldface letters

x

. Given a random vector (realization)

X

(

x

),

X^{∖ k}

(

x^{∖ k}

) denotes the vector

X

(

x

) with the kth element removed. Calligraphic letters

X

denote sets. The symbol

P

denotes probabilities of events, and

E_{X}

denotes expectations with respect to the random variable X.

2. Materials and Methods

The details concerning how plots are generated are provided in Section 5, along with a discussion of a specific example.

3. Problem Formulation, Definitions, and Assumptions

3.1. Problem Setup

Consider a set of n agents. The global environmental variable H is the binary

H \in {0, 1}

. Agent k (

k = 1, 2, \dots; n

) receives a signal

Y_{k} \in Y

, with

Y

being the signal space. The probability density of

Y_{k}

conditioned on

H = h

is denoted as

p_{h}^{k} (y)

. In addition, each agent takes a state

S_{k} \in {1, 2, \dots; m}

, where m is a finite integer. The prior for the state of agent k is

p^{k} (s)

, and we define the vector

p^{k} = {[p^{k} (1), \dots, p^{k} (m)]}^{T}

. The collection of states

S = {[S_{1}, \dots, S_{n}]}^{T}

is called the network state, with joint density

p (s)

. For a given network state, we denote the empirical distribution (or the type) of

S

as

Q_{S}^{n}

, that is,

Q_{S}^{n} (i) = \frac{1}{n} \sum_{k = 1}^{n} 1_{{S_{k} = i}}, i \in {1, \dots; m},

(1)

where

1_{{S_{k} = i}}

is the indicator that agent k is in state i. Let

Q^{n}

denote the set of all empirical distributions corresponding to sequences of length n; then, for a given

q^{n} \in Q^{n}

,

T (q^{n})

is the type-class of

q^{n}

, i.e.,

T (q^{n}) = {s \in {1, 2, \dots; m}^{n} : Q_{s}^{n} = q^{n}},

(2)

where

{1, 2, \dots; m}^{n}

is the Cartesian product of

{1, 2, \dots; m}

with itself n times. Note that

Q_{S}^{n}

is a random vector with realization

q^{n}

, that is,

p (q^{n}) = P (Q_{S}^{n} = q^{n}) = P (T (q^{n})) = \sum_{s \in T (q^{n})} p (s) .

(3)

The joint probability distribution of

Y_{k}

and the network type under hypothesis

H = h

is provided by

p_{h}^{k} (y, q^{n})

. The associated conditional density is denoted as

p_{h}^{k} (y | q^{n})

. Let

P^{m}

denote the probability simplex in

R^{m}

:

P^{m} = {q \in R^{m} : q_{i} \geq 0, \sum_{i} q_{i} = 1} .

(4)

For

q \in P^{m}

, the conditional density

p_{h}^{k} (y | q)

is called the signal model for agent k. When we write densities conditioned on

q \in P^{m}

, we are assuming that these densities have a functional dependence on

q

in order to avoid issues with measurability, as certain types may not be observable regardless of the size of the network. For a simple example, consider

{[\frac{1}{e}, 1 - \frac{1}{e}]}^{T} \in P^{2}

, which is never in

Q^{n}

for any n due to the fact that

\frac{1}{e}

is irrational. We define

Y = {[Y_{1}, \dots, Y_{n}]}^{T}

, while the joint density of

Y

and

Q^{n}

and the density of

Y

conditioned on

Q^{n} = q^{n}

under

H = h

are denoted as

p_{h} (y, q^{n})

and

p_{h} (y | q^{n})

, respectively. The joint density

p_{h} (y | q)

is called the joint signal model. For brevity, we call the conditional distribution

p (H = h | q^{n})

the hypothesis model. It is important to note that we do not assume conditional independence of the agents’ observations, i.e., we can have

p_{h} (y) \neq \prod_{k} p_{h}^{k} (y_{k})

for

h \in {0, 1}

; we do, however, assume that the structure described below holds.

Assumption 1.

The joint signal model obeys the following:

\forall y

,

\forall q \in P^{m}

,

\forall h \in {0, 1}

,

p_{h} (y | q) = \prod_{k = 1}^{n} p_{h}^{k} (y_{k} | q) .

(5)

Equation (5) states that the signal

Y_{k}

of agent k is independent of

Y^{∖ k}

conditioned on both H and

Q^{n}

.

Assumption 2.

\forall y

,

\forall q^{n} \in Q^{n}

,

h \in {0, 1}

,

p_{h} (y, q^{n}) > 0

; the joint densities have the same support under both hypotheses.

Upon receiving observation

Y_{k}

, agent k makes a decision

U_{k} \in {1, 2, \dots; b}

according to a rule, which is a (possibly randomized) function from

Y

to the decision space

U

. We denote the possibly randomized rule used by agent k as

γ_{k}

,

U_{k} = γ_{k} (Y_{k}) \sim p^{k} (u | y) .

(6)

The collection of rules

γ = {[γ_{1}, \dots, γ_{n}]}^{T}

is called a strategy. After agent k has made its decision, it sends

U_{k}

to the fusion center through a noisy communication link which is allowed to depend on the type

q^{n}

. Upon sending

U_{k}

, the fusion center receives the message

X_{k}

with

X_{k} \sim p^{k} (x | u, q^{n}) .

(7)

Given

q \in P^{m}

, the conditional density

p^{k} (x | u, q)

is the channel model for agent k. We define

X = {[X_{1}, \dots, X_{n}]}^{T}

, and the joint conditional density

p (x | u, q)

is called the joint channel model.

Assumption 3.

The joint channel model obeys the following:

\forall x

,

u

,

\forall q \in P^{m}

,

p (x | u, q) = \prod_{k = 1}^{n} p^{k} (x_{k} | u_{k}, q) .

(8)

Assumption 4.

\forall x

,

u

,

\forall q \in P^{m}

,

p (x | u, q) > 0

.

The fusion center does not know the network state

S

, however, we assume that it knows

Q_{S}^{n}

. This assumption is not strong, as the empirical distribution

Q_{S}^{n}

can be estimated via consensus methods [47]. Upon receiving messages

X

and

Q_{S}^{n}

, the fusion center makes an inference as to which hypothesis is true, denoted by

\hat{H}

. We seek to minimize the asymptotic decay rate of the probability of the error (as defined in Equation (10)). We assume that the fusion center is using the maximum a posterori (MAP) rule, i.e.,

\hat{H} = \{\begin{matrix} 1, & (x, q^{n}) \in A^{γ} \\ 0, & (x, q^{n}) \in A^{γ^{c}} \end{matrix}, where A^{γ} = \{(x, q^{n}) : \frac{p_{1} (x | q^{n})}{p_{0} (x | q^{n})} \geq \frac{p (H = 0 | q^{n})}{p (H = 1 | q^{n})}\},

(9)

which minimizes the probability of error for a given strategy

γ

. The set

A^{γ}

depends on the specific strategy

γ

selected; given

γ

, it is possible to compute the optimal inference rule as a deterministic function of

γ

using Equation (9). The complete problem setup is summarized in Figure 1.

3.2. Definitions

We now introduce several definitions and concepts that are used throughout the paper.

Definition 1.

Let

P^{γ} (\hat{H} \neq H)

be the probability of error under strategy γ. We define the error exponent Λ (provided the limit exists) as:

Λ = - lim_{n \to \infty} inf_{γ} \frac{1}{n} log P^{γ, ψ} (\hat{H} \neq H) .

(10)

The limit

Λ

depends on the strategy

γ

. Thus, the strategy

γ^{*}

that achieves the infimum may be such that the limit does not exist. Moreover, (10) makes no assumption as to how the statistical properties of the agents vary with n; in general, it is not possible to say anything about the existence of

Λ

. However, in many practical settings, such as homogeneous networks and power-constrained networks,

Λ

exists and has a nice closed-form solution [16,27,45]. The main result of this work is an equivalent characterization of the error exponent defined above, showing that in our scenario the limit does exist. This equivalent expression has several desirable properties, and we can directly optimize the equivalent expression.

Definition 2.

The Kullback–Leibler Divergence between two distributions

q

and

p

is provided as follows:

D (q | | p) = \sum_{x} q (x) log \frac{q (x)}{p (x)} .

Here, we are interested in understanding the interactions between different classes of agents, where members of a given class are identical, defined as follows.

Definition 3.

Given a collection of n agents, these agents are identical if the following conditions hold:

1.: $p_{h}^{k} (y | q) = p_{h}^{j} (y | q)$ for all $k, j \in {1, 2, \dots; n}$ , $h \in {0, 1}$ , $y \in Y$ , $q \in P^{m}$ .
2.: $p^{k} (x | u, q) = p^{j} (x | u, q)$ for all $k, j \in {1, 2, \dots; n}$ , $x \in X$ , $u \in {1, 2, \dots; b}$ , $q \in P^{m}$ .
3.: The agent states $S_{k}$ are i.i.d. a priori, i.e., $p (s) = \prod_{k} p^{k} (s_{k}) = \prod_{k} p (s_{k})$ .

Condition (1) states that, conditioned on the hypothesis H and the network type

Q_{S}^{n}

, the probability distributions on the received signals for all agents are the same. Similarly, Condition (2) states that, conditioned on the network type

Q_{S}^{n}

and

U_{k} = u

for all

k \in {1, 2, \dots; n}

, the probability distributions on the received messages are the same for all agents.

Definition 4.

A class is a collection of agents that are all identical.

3.3. Key Assumptions

We first derive the error exponent for the single-class case in Theorem 1, which is then generalized to the case of multiple classes.

Assumption 5.

Our key assumptions for Theorem 1 are as follows:

(a): All agents are identical, as provided in Definition 3. Hence, we remove the notational dependence on k in the sequel.
(b): The hypothesis model obeys the following:

$lim_{n \to \infty} \frac{1}{n} log min_{q^{n}} min {p (H = 1 | q^{n}), p ((H = 0 | q^{n})} = 0 .$

(11)
(c): The signal model is continuous in $q$ for all agents, that is, if ${α_{i}}_{i \in Z}$ is a sequence in $P^{m}$ such that ${lim}_{i \to \infty} α_{i} = q$ , then $\forall y$ ,

$lim_{i \to \infty} p_{h} (y | α_{i}) = p_{h} (y | q), h \in {0, 1} .$

(12)
(d): The channel model is continuous in $q$ for all agents. That is, if ${α_{i}}_{i \in Z}$ is a sequence in $P^{m}$ such that ${lim}_{i \to \infty} α_{i} = q$ , then $\forall x$ , $\forall u$ ,

$lim_{i \to \infty} p (x | u, α_{i}) = p (x | u, q) .$

(13)

Remark 1.

Recall that the fusion center knows the empirical distribution

q^{n}

and that the optimal rule is provided by (9). Hence, if (33) does not hold, then the threshold

\frac{p (H = 0 | q^{n})}{p (H = 1 | q^{n})}

may either grow or decay exponentially quickly, biasing the fusion center to the point that the decisions

u

become irrelevant. Hence, if the empirical distribution of the state carries too much information about the hypothesis, then the probability of error can be driven to zero exponentially quickly by simply looking at the network state, regardless of the rules used by the agents, leading to the need for Assumption 5.b. Assumptions 5.c and 5.d imply that if two distributions in

P^{m}

are close with respect to the standard Euclidean metric, then the resulting signal and channel models should be close for all y and x, respectively.

4. Main Results and Important Corollaries

We first consider the single-class result (Theorem 1). We discuss its implications and outline the needed proof techniques, then turn our attention to the multi-class case, which begins by extending Theorem 1 to Lemma 1 and then stating Theorem 2 and its implications. For the main theorems, we provide proof outlines in this section and the complete proofs in Section 6. The extension of Theorem 1 to Lemma 1 is provided in Appendix A.2.

4.1. Single-Class Results

Theorem 1.

Subject to Assumptions 5.a–5.d,

Λ = - lim_{n \to \infty} inf_{γ} min_{λ \in [0, 1]} max_{q \in P^{m}} - D (q | | p) + \frac{1}{n} log \int_{x} p_{0} {(x | q)}^{1 - λ} p_{1} {(x | q)}^{λ},

(14)

where

D (q | | p)

is the Kullback–Leibler (KL) divergence between the distribution

q \in P^{m}

and the true state distribution

p

.

Theorem 1 provides an alternative asymptotically equivalent expression for the error exponent. In particular, Theorem 1 states that a single distribution dominates the asymptotic performance. Interestingly, the dominating distribution is in general not the true distribution of the agents’ states, despite the fact that the empirical distribution of the states converges towards the true distribution. We then extend Theorem 1 to multiple classes; if agents with a single class use a common rule, the error exponent for each class depends only on the ratios of numbers of agents between classes.

We underscore why the Chernoff information is challenging to compute for our problem framework:

Λ = - lim_{n \to \infty} inf_{γ} min_{λ \in [0, 1]} \frac{1}{n} log \int_{x} \sum_{q^{n}} p_{0} {(x, q^{n})}^{1 - λ} p_{1} {(x, q^{n})}^{λ} .

(15)

As n grows, so does the space of potential strategies

γ

, possible messages

x

, and possible types

Q^{n}

. Even if we have identical agents using the same rule, the complexity and coupling due to the summation over

q^{n}

remains. If agents use the same rule

\begin{matrix} \frac{1}{n} log \int_{x} \sum_{q^{n}} p_{0} {(x, q^{n})}^{1 - λ} & p_{1} {(x, q^{n})}^{λ} = \frac{1}{n} log \int_{x} \sum_{q^{n}} p_{0} {(x | q^{n})}^{1 - λ} p_{1} {(x | q^{n})}^{λ} p {(q^{n} | H = 0)}^{1 - λ} p {(q^{n} | H = 1)}^{λ} \end{matrix}

(16)

\begin{matrix} = \frac{1}{n} log \sum_{q^{n}} p {(q^{n} | H = 0)}^{1 - λ} p {(q^{n} | H = 1)}^{λ} \int_{x} \prod_{k} p_{0}^{k} {(x_{k} | q^{n})}^{1 - λ} p_{1}^{k} {(x_{k} | q^{n})}^{λ} \end{matrix}

(17)

\begin{matrix} = \frac{1}{n} log \sum_{q^{n}} p {(q^{n} | H = 0)}^{1 - λ} p {(q^{n} | H = 1)}^{λ} \prod_{k} \int_{x} p_{0}^{k} {(x | q^{n})}^{1 - λ} p_{1}^{k} {(x | q^{n})}^{λ} \end{matrix}

(18)

\begin{matrix} \overset{(a)}{=} \frac{1}{n} log \sum_{q^{n}} p {(q^{n} | H = 0)}^{1 - λ} p {(q^{n} | H = 1)}^{λ} {(\int_{x} p_{0}^{1} {(x | q^{n})}^{1 - λ} p_{1}^{1} {(x | q^{n})}^{λ})}^{n}, \end{matrix}

(19)

where

(a)

is due to the fact that agents are identical and use the same rule, then all terms in the product are identical. Note that due to the summation over

q^{n}

, as previously stated, the complexity of calculating the Chernoff information grows with n, leading to the need for Theorem 1.

There are a few key remarks that must be made here about Theorem 1:

The maximization occurs over $P^{m}$ instead of $Q^{n}$ ; hence, we have directly removed the dependence on $q^{n}$ . Because the expression in Theorem 1 is continuous over the compact set $P^{m}$ , it always achieves its maximum (versus supremum). This is due to Assumptions 5.c and 5.d.
Note that the second term is the classical Chernoff information corresponding to the fixed distributions $p_{h} (x | q)$ , $h \in {0, 1}$ and that the KL divergence term can be thought of as a bias. Hence, we only need to consider the m-dimensional probability vector that yields the worst Chernoff information biased by the KL divergence. In a certain sense, $q$ is sufficiently close to the true state distribution p, such that its poor performance (under strategy $γ$ ) cannot be ignored even in asymptotically large networks. Only one distribution in $P^{m}$ dominates the asymptotic performance, as expected, although it may not be the true distribution p. An instantiation of this is provided in the numerical results.
The maximization for $q$ takes place over all of $P^{m}$ ; however, it is only necessary to search a subset of $P^{m}$ to find the maximum, thereby reducing the computational cost. To determine the subset of interest, observe that

$\begin{matrix} min_{q \in P^{m}} D (q | | p) - \frac{1}{n} log \int_{x} p_{0} {(x | q)}^{1 - λ} p_{1} {(x | q)}^{λ} & \leq D (p | | p) - \frac{1}{n} log \int_{x} p_{0} {(x | p)}^{1 - λ} p_{1} {(x | p)}^{λ} \end{matrix}$

(20)

$\begin{matrix} = - \frac{1}{n} log \int_{x} p_{0} {(x | p)}^{1 - λ} p_{1} {(x | p)}^{λ} \end{matrix}$

(21)

$\begin{matrix} \overset{(a)}{\leq} - \frac{1}{n} log \sum_{u} p_{0} {(u | p)}^{1 - λ} p_{1} {(u | p)}^{λ} \end{matrix}$

(22)

$\begin{matrix} \overset{(b)}{\leq} - \frac{1}{n} log \int_{y} p_{0} {(y | p)}^{1 - λ} p_{1} {(y | p)}^{λ}, \end{matrix}$

(23)

where both $(a)$ and $(b)$ are due to Hölder’s inequality. Using the fact that the Chernoff information is non-negative [44], it can be seen that the distribution $q^{*}$ that achieves the maximum over $P^{m}$ must satisfy

$D (q^{*} | | p) \leq - \frac{1}{n} log \int_{y} p_{0} {(y | p)}^{1 - λ} p_{1} {(y | p)}^{λ} = c (λ, p) .$

(24)

The right-hand side of (24) is the Chernoff information for the signal model under distribution p; hence, the maximizing $q^{*}$ must live in a ball defined by the Kullback–Leibler divergence centered at the distribution p with radius $c (λ, p)$ , thereby reducing the search space for the optimization. In fact, the Chernoff information admits a closed-form solution for a wide range of distributions, such as members of the exponential family [48]
The expression in Theorem 1 admits the following property: for all $q \in P^{m}$ and $λ \in [0, 1]$ ,

$\begin{matrix} \frac{1}{n} log \int_{x} p_{0} {(x | q)}^{1 - λ} p_{1} {(x | q)}^{λ} & \overset{(a)}{=} \frac{1}{n} log \int_{x} {[\prod_{k = 1}^{n} p_{0}^{k} (x_{k} | q)]}^{1 - λ} {[\prod_{k = 1}^{n} p_{1}^{k} (x_{k} | q)]}^{λ} \end{matrix}$

(25)

$\begin{matrix} = \frac{1}{n} log \prod_{k = 1}^{n} \int_{x_{k}} p_{0}^{k} {(x_{k} | q)}^{1 - λ} p_{1}^{k} {(x_{k} | q)}^{λ} & = \frac{1}{n} \sum_{k = 1}^{n} log \int_{x_{k}} p_{0}^{k} {(x_{k} | q)}^{1 - λ} p_{1}^{k} {(x_{k} | q)}^{λ}, \end{matrix}$

(26)

where $(a)$ holds due to both Equations (5) and (8). Then, for agents using a common rule, all terms in the sum are equal; thus,

$\frac{1}{n} log \int_{x} p_{0} {(x | q)}^{1 - λ} p_{1} {(x | q)}^{λ} = log \int_{x} p_{0}^{1} {(x | q)}^{1 - λ} p_{1}^{1} {(x | q)}^{λ},$

(27)

which does not depend on n, helping to simplify analysis.

We next sketch the proof of Theorem 1. We start from the classical Chernoff information and use it to show that

Λ = - lim_{n \to \infty} inf_{γ} min_{λ \in [0, 1]} \frac{1}{n} log \int_{x} \sum_{q^{n}} p_{0} {(x | q^{n})}^{1 - λ} p_{1} {(x | q^{n})}^{λ} p (q^{n}) .

(28)

To prove the result, we wish to show that

| \frac{1}{n} log \frac{\int_{x} \sum_{q^{n}} p_{0} {(x | q^{n})}^{1 - λ} p_{1} {(x | q^{n})}^{λ} p (q^{n})}{{max}_{q \in P^{m}} \int_{x} p_{0} {(x | q)}^{1 - λ} p_{1} {(x | q)}^{λ} 2^{- n D (q | | p)}} | < ϵ .

(29)

uniformly in λ and γ, that is, we wish to show that for any

ϵ > 0

there exists an integer

n_{ϵ}

that is independent of λ and γ such that (29) holds for all

n \geq n_{ϵ}

. Uniform convergence in

λ

and

γ

enables determination of the minimum and infimum, respectively, yielding

\begin{matrix} inf_{γ} min_{λ \in [0, 1]} & \frac{1}{n} log \int_{x} \sum_{q^{n}} p_{0} {(x | q^{n})}^{1 - λ} p_{1} {(x | q^{n})}^{λ} p (q^{n}) \\ - inf_{γ} min_{λ \in [0, 1]} max_{q \in P^{m}} - D (q | | p) + \frac{1}{n} log \int_{x} p_{0} {(x | q)}^{1 - λ} p_{1} {(x | q)}^{λ} \to 0, \end{matrix}

(30)

as

n \to \infty

, which is the desired assertion. Equivalently, it can be shown that

(1 - ϵ) < {[\frac{\int_{x} \sum_{q^{n}} p_{0} {(x | q^{n})}^{1 - λ} p_{1} {(x | q^{n})}^{λ} p (q^{n})}{{max}_{q \in P^{m}} \int_{x} p_{0} {(x | q)}^{1 - λ} p_{1} {(x | q)}^{λ} 2^{- n D (q | | p)}}]}^{\frac{1}{n}} < (1 + ϵ),

(31)

uniformly in

λ

and

γ

.

4.2. Multi-Class Results

We now discuss extending the results in the previous section to the case of multiple classes. Consider a set of

n_{c} < \infty

classes. For a given class

c \in {1, 2, \dots; n_{c}}

, let

c_{k}

be the number of agents that belong to class c. Then, let

Y_{c, k} \in Y

and

S_{c, k} \in {1, 2, \dots; m}

be the signal and state, respectively, of the kth agent in class c. Without loss of generality, assume that the signal space

Y

and state space

{1, 2, \dots; m}

are the same for all classes. Furthermore, for a given network state

S

, let

Q_{c, S}^{n}

denote the type of the states of the agents belonging to class c, i.e.,

Q_{c, S}^{n} (i) = \frac{1}{c_{k}} \sum_{k = 1}^{c_{k}} 1_{{S_{c, k} = i}}, i \in {1, \dots; m} .

(32)

For given realizations of the class types

q_{1}^{n}, \dots; q_{n_{c}}^{n}

, the signal model and state prior for class c are denoted as

p^{c} (y | q_{1}^{n}, \dots; q_{n_{c}}^{n})

and

p^{c} (s)

, respectively, and with

p^{c} = {[p^{c} (1), \dots, p^{c} (m)]}^{T}

. Recall that per Definition 4, all agents in a given class are identical; thus, the signal models and state priors are the same within the class. Let

U_{c, k} \in {1, 2, \dots; b}

be the decision made by the kth agent in class c distributed according to

p^{c, k} (u | y)

, with

γ_{k}^{c}

being the rule of the kth agent in class c (again assuming that, without loss of generality, the decision space

{1, 2, \dots; b}

is the same for all classes). The message of the kth agent in class c received by the fusion center is denoted as

X_{c, k}

and distributed according to

p^{c} (x | u, q_{1}^{n}, \dots; q_{n_{c}}^{n})

. Again, because agents in the same class are identical, the channel model is the same throughout the class. Moreover, let

X_{c} = {[X_{c, 1}, \dots; X_{c, c_{k}}]}^{T}

be the vector of received messages from all agents in class c. We can then extend Assumption 5 to the case of c classes.

Assumption 6.

The following assumptions hold for all classes. Hence, for notational simplicity, when referring to class c we remove the k superscript.

(a): The hypothesis model obeys the following.

$lim_{n \to \infty} \frac{1}{n} log min_{q_{1}^{n}, \dots, q_{n_{c}}^{n}} min {p (H = 1 | q_{1}^{n}, \dots, q_{n_{c}}^{n}), p (H = 0 | q_{1}^{n}, \dots, q_{n_{c}}^{n})} = 0 .$

(33)
(b): The signal model is continuous in $q_{1}, \dots; q_{n_{c}}$ for all classes, that is, if ${α_{1, i}}_{i \in Z}, \dots$ ; ${α_{n_{c}, i}}_{i \in Z}$ are sequences in $P^{m}$ such that ${lim}_{i \to \infty} α_{j, i} = q_{j}$ , $j = 1, 2, \dots; n_{c}$ , then $\forall y$ ,

$lim_{i \to \infty} p_{h}^{c} (y | α_{1, i}, \dots; α_{n_{c}, i}) = p_{h}^{c} (y | q_{1}, \dots; q_{n_{c}}), h \in {0, 1} .$

(34)
(c): The channel model is continuous in $q_{1}, \dots; q_{n_{c}}$ for all classes, that is, if ${α_{1, i}}_{i \in Z}, \dots$ ; ${α_{n_{c}, i}}_{i \in Z}$ are sequences in $P^{m}$ such that ${lim}_{i \to \infty} α_{j, i} = q_{j}$ , $j = 1, 2, \dots; n_{c}$ , then $\forall x$ , $\forall u$ ,

$lim_{i \to \infty} p^{c} (x | u, α_{1, i}, \dots; α_{n_{c}, i}) = p^{c} (x | u, q_{1}, \dots; q_{n_{c}}) h \in {0, 1} .$

(35)

The conditions of Assumption 6 closely resemble those of Assumption 5. Namely, Assumption 6.a retains the assumption that the network type for each class should not carry too much information about the hypothesis, while Assumptions 6.b and 6.c extend the assumption that the signal and channel models are continuous in the univariate case to the multi-dimensional case. Before, the models were continuous only in

q_{1}

, whereas we now assume that are continuous in

q_{1}, \dots; q_{n_{c}}

.

Lemma 1.

Assume that

\forall k

and

{lim}_{n \to \infty} \frac{c_{k}}{n} > 0

; then, under Assumptions 6.a–6.c,

Λ = - lim_{n \to \infty} inf_{γ} min_{λ \in [0, 1]} max_{q_{1}, \dots; q_{n_{c}}} \frac{1}{n} \sum_{c = 1}^{n_{c}} - c_{k} D (q_{c} | | p^{c}) + log \int_{x^{c}} p_{0} {(x^{c} | q_{1}, \dots; q_{n_{c}})}^{1 - λ} p_{1} {(x^{c} | q_{1}, \dots; q_{n_{c}})}^{λ} .

(36)

Lemma 1 implies that all agents within a given class c use the same rule

γ^{c}

. When referring to the rule used by all agents in class c, we use superscripts to avoid confusion with previously defined notation, where a subscript indicates the rule used by a specific agent. Then, the error exponent takes on a form that allows heterogeneous networks with a high degree of interference to be examined. The details of the extensions of Theorem 1 are provided in Appendix A.2; Lemma 1 leads to the following theorem.

Theorem 2.

Let

r_{c} \in [0, 1]

be the fraction of agents that belong to class

c \in {1, 2, \dots; n_{c}}

, i.e.,

c_{k} = ⌊ r_{c} n ⌋

, with

\sum_{c = 1}^{n_{c}} r_{c} = 1

and where

⌊ x ⌋

denotes the largest integer that is less than or equal to x. Moreover, suppose that all

r_{c}

are held constant as

n \to \infty

and that agents in the same class use a common rule. Then, under Assumptions 6.a–6.c,

Λ = - inf_{γ^{1}, \dots; γ^{n_{c}}} min_{λ \in [0, 1]} max_{q_{1}, \dots; q_{n_{c}}} \sum_{c = 1}^{n_{c}} r_{c} [- D (q_{c} | | p^{c}) + log \int_{x} p_{0}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{1 - λ} p_{1}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{λ}] .

(37)

Because identical agents with a common rule may not be optimal, Theorem 2 provides a lower bound on the optimal error exponent. We highlight several important points of Theorem 2 below:

Observe that all agents are coupled through the distributions $q_{1}, \dots; q_{n_{c}}$ , and recall that for a given class c, $q_{c}$ depends on all agents in class c through their states $S_{c, k}$ . Hence, the distributions $q_{1}, \dots; q_{n_{c}}$ collectively depend on all agents in the network, meaning that the received signal, decision, and message for a given agent are dependent on all agents in the network. As a result, Theorem 2 captures a very strong form of coupling.
Note that the expression in Theorem 2 is not expressed as a limit, does not depend on n, and does not depend on the actual size of the classes. Hence, Theorem 2 provides an objective function that can be used to design rules $γ^{1}, \dots; γ^{n_{c}}$ that do not depend on the size of the network.
Theorem 2 depends only on the ratios of the classes; that is, Theorem 2 provides an explicit objective function to find the optimal ratios for asymptotically large networks. Specifically, to find the optimal ratios we can solve

$min_{r_{1}, \dots; r_{n_{c}}} inf_{γ^{1}, \dots; γ^{n_{c}}} min_{λ \in [0, 1]} max_{q_{1}, \dots; q_{n_{c}}} \sum_{c = 1}^{n_{c}} r_{c} [- D (q_{c} | | p^{c}) + log \int_{x} p_{0}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{1 - λ} p_{1}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{λ}] .$

(38)

In the next section, we present a numerical example that highlights the utility of the proposed framework.

5. Numerical Example

We design an example that highlights the different forms of coupling captured by our framework. Note that the total number of agents is never specified, as it is only the fraction of agents in each class (ratio) that matters. However, considering our asymptotic analysis, the network size must be sufficiently large. Consider a three-class system where all agents take one of two states (1 or 2) with

p^{1} (S = 1) = 0.5

and

p^{2} (S = 1) = p^{3} (S = 1) = 0.9

, under each hypothesis all classes observe a Gaussian random variable with signal models

\begin{matrix} p_{h}^{1} (y | q_{1}, q_{2}, q_{3}) = \frac{1}{\sqrt{2 π}} exp (- \frac{1}{2} {(y - μ (h, q_{2}))}^{2}), [\begin{matrix} μ (0, q_{2}) \\ μ (1, q_{2}) \end{matrix}] = [\begin{matrix} 0 \\ α r_{1} q_{2} (1) \end{matrix}] \end{matrix}

(39)

and

p_{h}^{c} (y | q_{1}, q_{2}, q_{3}) = \frac{1}{\sqrt{2 π}} exp (- \frac{1}{2} y^{2}), c \in {2, 3},

(40)

where

μ (h, q_{2})

is the mean of the signal model when

H = h \in {0, 1}

,

q_{2}

is the empirical distribution of Class 2,

α

is a constant that determines the separation between the means of the two hypotheses, and

r_{i} = i_{k} / (\sum_{c = 1}^{3} c_{k})

is the ratio for Class i.

Important notes about the signal models are as follows:

When $H = 1$ , the signal model for Class 1 depends only on the number of agents in Class 2 that are in State 1.
The signal models for Classes 2 and 3 are constant with respect to the underlying hypothesis as well as the distributions $q_{1}$ , $q_{2}$ , and $q_{3}$ ; hence, agents in Class 2 or 3 cannot distinguish between the two hypotheses.

Upon receiving the signal, each agent in class 1 makes a binary decision according to a threshold test, i.e.,

u_{1, k} = 1 \Leftrightarrow y_{1, k} \leq τ

. Observe that because agents in the other two classes cannot distinguish between hypotheses, their decisions do not matter. Note that the agents belonging to class 1 use identical thresholds; while these may not be optimal, they simplify both design and analysis. Each agent then sends its decision over a binary symmetric channel with the following crossover probability:

\begin{matrix} p^{c} (x = 1 | u = 0, q_{1}, q_{2}, q_{3}) & = p^{c} (x = 0 | u = 1, q_{1}, q_{2}, q_{3}) \\ = max \{| \frac{1}{2} - r_{3} q_{3} (1) |, ρ\} c \in {1, 2, 3}, 0 < ρ < \frac{1}{2} . \end{matrix}

(41)

The parameter

ρ

governs the minimum achievable crossover probability of the channel. Note that because

| \frac{1}{2} - r_{3} q_{3} (1) | \leq \frac{1}{2}

, the crossover probability can never be lower than

ρ

; thus, as

ρ

increases the channel becomes worse. It can be seen that while Class 2 aids Class 1 in distinguishing between the two hypotheses, Class 3 controls the quality of the channel between the agents and the fusion center. Moreover, if

r_{2} = 0

then agents cannot distinguish the two hypotheses; thus, the error exponent is zero. Similarly, if

r_{3} = 0

, the crossover probability for all channels becomes

\frac{1}{2}

; thus, the channel output becomes random and the error exponent becomes zero. This example underscores the impact of cross-class interference on proper optimization of the system. To determine the optimal class ratios, we can solve

(r_{1}^{*}, r_{2}^{*}, r_{3}^{*}) = arg min_{r_{1}, r_{2}, r_{2}} max_{q_{1}, q_{2}, q_{3}} \sum_{c = 1}^{3} r_{c} [- D (q_{c} | | p^{c}) + log \sum_{x = 1}^{2} \sqrt{p_{0}^{c} (x | q_{1}, q_{2}) p_{1}^{c} (x | q_{1}, q_{2})}],

(42)

with

r_{1}^{*} + r_{2}^{*} + r_{3}^{*} = 1

. For computational simplicity, we set

τ = 15

and

λ = \frac{1}{2} .

These values can be further optimized.

In Figure 2a, we compute the optimal error exponent as a function of the channel quality

ρ

for various values of

α

. Note that the class ratios are optimized for each data point. Recall that as

ρ

increases, so does the interference, causing the channel to worsen. The importance of the channel on the overall system performance can be clearly seen. As

ρ

increases, the minimum achievable crossover probability increases and the best-case quality of the channel decreases; hence, the optimal error exponent decreases along with the quality of the channel. In fact, when

ρ = 0.4

, the optimal error exponent is

0.0136

, an entire order of magnitude less than when

ρ = 0.1

. The impact of the signal mean for Class 1 is determined by

α .

Not surprisingly, as the mean increases, the error exponent increases as well; however, we begin to see diminishing returns as we move from

α = 100

to

α = 150 .

In Figure 2b, the optimal ratio between the three classes is determined as a function of channel quality when

α = 150

. Figure 2b reveals the impact of cross-class interactions. Recall that each class serves a different purpose; Class 1 is the only class that can distinguish between hypotheses, Class 2 controls the sensing capabilities of Class 1, and Class 3 controls the channel quality for Class 1. Hence, the performance of the system relies on the interactions between the three classes. In particular, as

ρ

increases Class 3 becomes less important to the overall system, as the quality of the channel degrades. This can be seen in Figure 2b by the decreasing

r_{3}^{*}

and the fact that Class 1 becomes more important to the system, hence the increasing

r_{1}^{*}

.

Finally, we examine the optimizing distribution for computing the error exponents when

α = 100

. As previously noted, the true class distributions of the states (

p^{c}

) do not necessarily dominate asymptotic performance. This can be seen in Figure 2c, which shows that the optimal types are sometimes different from the true distributions. Recall that under

S = 1

we have

p^{1} = 0.5

and

p^{2} = p^{3} = 039

; thus, in this three-class example, it is only when

ρ = 0.06

that we see the optimizing distribution aligning with the true distribution. We underscore that the network type converges to the true state distribution. Recall that we assume the signal and channel models to be continuous; hence, as the network types converge to the true distributions, the performances of all other distributions in a neighborhood around the true distributions are relatively close. Then, it may be beneficial to design the rule

γ

to optimize detection for a distribution close to the true distributions, as the performance difference is small. This trade-off is captured by our result, where the closeness to

p^{c}

is captured by the KL divergence and the asymptotic detection performance is captured by the Chernoff information term. Hence, the dominating distribution is the one that offers the best trade-off.

6. Proofs

6.1. Proof of Theorem 1

Before we begin the proof, we must introduce a number of important definitions and lemmas. There are two sets of lemmas. The first set of lemmas is a series of well-known mathematical facts. Because these are not our contributions but are necessary for the proof of Theorem 1, we omit the proofs, though we provide appropriate citations as necessary. The second set of lemmas is a series of results that, while necessary, are not major contributions of this work; these proofs are provided in Appendix A.1.

6.1.1. Definitions

Definition 5.

A family of functions

F

defined on a common domain is equicontinuous at a point

x_{o}

if for any

ϵ > 0

there exists a

δ > 0

(possibly a function of ϵ and

x_{o}

) such that whenever

| x - x_{o} | < δ

we have

| f (x) - f (x_{o}) | < ϵ

for all

f \in F

.

Observe that while the

δ

above may depend on

ϵ

and the specific point

x_{o}

, it is not allowed to depend on the specific function f, i.e., the chosen

δ

must work for all functions in

F

. The next definition removes the dependence on

x_{o}

.

Definition 6.

A family of functions

F

is uniformly equicontinuous if for any

ϵ > 0

there exists a

δ > 0

(possibly a function of ϵ) such that whenever

| x - y | < δ

we have

| f (x) - f (y) | < ϵ

for all

f \in F

.

The above definition states that the same

δ

must work for all functions

f \in F

at all points in the domain.

Definition 7.

Given a family of Lebesgue measurable functions

F

with

\int_{x} | f (x) | < \infty

for all

f (x) \in F

, the integrals

\int_{x} f (x)

are uniformly absolutely continuous if

\forall ϵ > 0

and

\exists δ > 0

such that for all Lebesgue measurable sets

A

with

ν (A) < δ

\int_{A} | f (x) | < ϵ,

(43)

for all

f \in F

, where ν denotes the Lebesgue measure. Of course, these definitions can be extended to any general measure space; however, we focus on the Lebesque measure here for simplicity and to avoid endlessly defining notation. For a thorough discussion of abstract measure spaces, see [49].

Again, it is important to distinguish that the same

δ

must work for all functions

f \in F

for a given

ϵ

.

Definition 8.

Assume that we have a family of measurable functions

F

with

\int_{x} | f (x) | < \infty

for all

f \in F

. Moreover, define

I_{a} = [- a, a]

. Then, the integrals

\int_{x} f (x)

are said to be uniformly absolutely convergent if

lim_{a \to \infty} \int_{I_{a}} | f (x) | = \int_{x} | f (x) |,

(44)

uniformly in

F

.

This is a powerful property, stating that for a given

ϵ > 0

there is a large enough a that all functions in

F

satisfy

| \int_{I_{a}} | f (x) | - \int_{x} | f (x) | | < ϵ .

(45)

6.1.2. Key Lemmas

The following lemmas are needed to prove Theorem 1. However, because most are simply known mathematical facts (except Lemma 3, the proof of which is provided in Appendix A.1), we omit the proofs.

Lemma 2.

Let

F

be an equicontinuous and pointwise-bounded family of functions defined on a common domain

D

. If

D

is compact, then

F

is uniformly equicontinuous on

D

.

Observe that

P^{m}

is compact due to it being closed and bounded; because all of our functions (signal models, channel models, etc.) are defined on this space, Lemma 2 allows us to simplify the proof.

Lemma 3.

Let

F

and

G

be families of equicontinuous strictly positive functions defined on a common domain

D

; furthermore, assume that for each point

x \in D

we have

{inf}_{f \in F} f (x) > 0

,

{inf}_{g \in G} g (x) > 0

,

{sup}_{f \in F} f (x) < \infty

, and

{sup}_{g \in G} g (x) < \infty

. Then, the family

{f {(x)}^{λ} g {(x)}^{1 - λ}}_{f, g, λ}

for

f \in F

,

g \in G

, and

λ \in [0, 1]

is equicontinuous on

D

.

The next lemma is taken from [49], Theorem 21.

Lemma 4.

Let

{f_{i}}

be a sequence of real measurable functions with

\int_{x} | f_{i} (x) | < \infty

. Assume that the integrals

\int_{x} f_{i} (x)

are uniformly absolutely continuous and uniformly absolutely convergent. Moreover, assume that

f_{i} \to f

almost everywhere (a.e.); then,

\int_{x} f (x) < \infty

and

lim_{i \to \infty} \int_{x} | f_{i} (x) - f (x) | = 0 .

(46)

Lemma 4 provides a nice immediate result. In particular, suppose we have a function of two variables

f (x, y)

with

\int_{x} | f (x, y) | < \infty

for all y and with

\int_{x} | f (x, y) |

uniformly absolutely continuous and uniformly absolutely convergent with respect to y. In this case, Lemma 4 states that the integral

\int_{x} f (x, y)

is continuous in y. To see this, observe that if

{y_{i}}

is a sequence with

y_{i} \to y

, then, per the triangle inequality,

lim_{i \to \infty} | \int_{x} f (x, y_{i}) - \int_{x} f (x, y) | \leq lim_{i \to \infty} \int_{x} | f (x, y_{i}) - f (x, y) | = 0 .

(47)

6.2. Intermediate Lemmas

We next present several intermediate results. The proofs of all these results can be found in Appendix A.1. Moreover, recalling that we assume all agents to be identical, we consequently omit the k superscript in the following lemmas as well as in the proof.

Lemma 5.

Subject to Assumptions 5.a–5.d, the following two statements hold:

(a): There exists a non-negative function $g (x)$ such that $\int_{x} g (x) < \infty$ and $\forall x$ , $h \in {0, 1}$ , $\forall γ$ , $\forall q \in P^{m}$ , and

$\sum_{u} \int_{y} p (x | u, q) p (u | y) p_{h} (y | q) = p_{h} (x | q) \leq g (x) .$

(48)
(b): We have

$inf_{γ} min_{λ \in [0, 1]} min_{q \in P^{m}} \int_{x} p_{0} {(x | q)}^{1 - λ} p_{1} {(x | q)}^{λ} > 0 .$

(49)

Lemma 6.

For all

ϵ > 0

, there exists a

δ > 0

(which depends only on ϵ and h) such that whenever α and β are two distributions in

P^{m}

with

{∥ α - β ∥}_{2} < δ

, then

\int_{y} | p_{h} (y | α) - p_{h} (y | β) | < ϵ

for all

h \in {0, 1}

.

Lemma 7.

For a fixed

x \in X

and

h \in {0, 1}

, the family

{p_{h} (x | q) 2^{- D (q | | p)}}_{γ}

which is indexed by γ is uniformly equicontinuous on

P^{m}

.

Lemma 8.

For a fixed

x \in X

, the family

{p_{0} {(x | q)}^{1 - λ} p_{1} {(x | q)}^{λ} 2^{- D (q | | p)}}_{γ, λ}

which is indexed by γ and

λ \in [0, 1]

is uniformly equicontinuous on

P^{m}

.

Lemma 9.

For any

ϵ > 0

, there exists a

δ > 0

(which depends only on ϵ) such that whenever α and β are two distributions in

P^{m}

with

{∥ α - β ∥}_{2} < δ

, then

\int_{x} | p_{0} {(x | α)}^{1 - λ} p_{1} {(x | α)}^{λ} 2^{- D (α | | p)} - p_{0} {(x | β)}^{1 - λ} p_{1} {(x | β)}^{λ} 2^{- D (β | | p)} | < ϵ,

for all γ and

λ \in [0, 1]

.

An immediate consequence of Lemma 9 follows.

Lemma 10.

For any

ϵ > 0

, there exists a

δ > 0

(which depends only on ϵ) such that, whenever α and β are two distributions in

P^{m}

with

{∥ α - β ∥}_{2} < δ

, we have

| \frac{\int_{x} p_{0} {(x | α)}^{1 - λ} p_{1} {(x | α)}^{λ} 2^{- D (α | | p)}}{\int_{x} p_{0} {(x | β)}^{1 - λ} p_{1} {(x | β)}^{λ} 2^{- D (β | | p)}} - 1 | < ϵ,

(50)

for all γ and

λ \in [0, 1]

.

The final lemma provides us with a starting point for the proof.

Lemma 11.

Λ = - lim_{n \to \infty} inf_{γ} min_{λ \in [0, 1]} \frac{1}{n} log \int_{x} \sum_{q^{n}} p_{0} {(x | q^{n})}^{1 - λ} p_{1} {(x | q^{n})}^{λ} p (q^{n}) .

Hence, rather than starting directly with the Chernoff information, we start from the expression in Lemma 11. We are now ready to begin the proof.

Proof of Theorem 1.

Define

q^{*} = arg max_{q \in P^{m}} - D (q | | p) + \frac{1}{n} log \int_{x} p_{0} {(x | q)}^{1 - λ} p_{1} {(x | q)}^{λ} .

(51)

and note that

q^{*}

depends on n,

γ

, and

λ

; then, for any

0 < ϵ < 1

, per Lemma 10,

\exists δ > 0

, which depends only on

ϵ

, such that whenever

∥ q - q^{*} ∥_{2} < δ

,

| \frac{\int_{x} p_{0}^{k} {(x | q)}^{1 - λ} p_{1}^{k} {(x | q)}^{λ} 2^{- D (q | | p)}}{\int_{x} p_{0}^{k} {(x | q^{*})}^{1 - λ} p_{1}^{k} {(x | q^{*})}^{λ} 2^{- D (q^{*} | | p)}} - 1 | < 1 - \sqrt{1 - ϵ},

(52)

for all

γ

and

λ \in [0, 1]

. Because the agents are identical, they differ only by the rules they use; hence, the same

δ

works for all agents. For this

δ

, define

T_{δ}^{n} = {q^{n} \in Q^{n} : ∥ q^{n} - q^{*} ∥_{2} < δ},

(53)

that is,

T_{δ}^{n}

is the set of all types that are less than

δ

away from

q^{*}

based on the Euclidean distance. There are two important points to make here regarding

T_{δ}^{n}

:

Because both $Q^{n}$ and $q^{*}$ depend on n, $T_{δ}^{n}$ does as well; however, because $δ$ depends only on $ϵ$ , any type in $T_{δ}^{n}$ satisfies Equation (52) regardless of n or $q^{*}$ .
Observe that for any $q \in P^{m}$ there exists a type $q^{n}$ such that $∥ q - q^{n} ∥_{2} < \frac{1}{n}$ . Hence, $\exists n_{o}$ such that for all $n \geq n_{o}$ and for any $q \in P^{m}$ , $\exists q^{n}$ such that $∥ q - q^{n} ∥_{2} < δ$ . That is, $T_{δ}^{n}$ is non-empty for all $n \geq n_{o}$ . Because $n_{o}$ depends only on $δ$ and $δ$ depends only on $ϵ$ , $n_{o}$ depends only on $ϵ$ , and the same $n_{o}$ works for all agents and all $λ \in [0, 1]$ .

The following argument holds for any

n \geq n_{o}

. We begin by observing that

\begin{matrix} \frac{\int_{x} p_{0} {(x | q^{n})}^{1 - λ} p_{1} {(x | q^{n})}^{λ}}{\int_{x} p_{0} {(x | q^{*})}^{1 - λ} p_{1} {(x | q^{*})}^{λ} 2^{- n D (q^{*} | | p)}} & = \frac{\int_{x} {[\prod_{k} p_{0}^{k} (x_{k} | q^{n})]}^{1 - λ} {[\prod_{k} p_{1}^{k} (x_{k} | q^{n})]}^{λ}}{\int_{x} {[\prod_{k} p_{0}^{k} (x_{k} | q^{*})]}^{1 - λ} {[\prod_{k} p_{1}^{k} (x_{k} | q^{*})]}^{λ} 2^{- n D (q^{*} | | p)}} \end{matrix}

(54)

\begin{matrix} = \frac{\int_{x} \prod_{k} p_{0}^{k} {(x_{k} | q^{n})}^{1 - λ} p_{1}^{k} {(x_{k} | q^{n})}^{λ}}{\int_{x} \prod_{k} p_{0}^{k} {(x_{k} | q^{*})}^{1 - λ} p_{1}^{k} {(x_{k} | q^{*})}^{λ} 2^{- D (q^{*} | | p)}} \end{matrix}

(55)

\begin{matrix} = \prod_{k} \frac{\int_{x} p_{0}^{k} {(x_{k} | q^{n})}^{1 - λ} p_{1}^{k} {(x_{k} | q^{n})}^{λ}}{\int_{x} p_{0}^{k} {(x_{k} | q^{*})}^{1 - λ} p_{1}^{k} {(x_{k} | q^{*})}^{λ} 2^{- D (q^{*} | | p)}} . \end{matrix}

(56)

Then, we have the following:

\begin{matrix} \sum_{q^{n}} \frac{\int_{x} p_{0} {(x | q^{n})}^{1 - λ} p_{1} {(x | q^{n})}^{λ}}{\int_{x} p_{0} {(x | q^{*})}^{1 - λ} p_{1} {(x | q^{*})}^{λ} 2^{- n D (q^{*} | | p)}} p (q^{n}) & = \sum_{q^{n}} \prod_{k} \frac{\int_{x} p_{0}^{k} {(x_{k} | q^{n})}^{1 - λ} p_{1}^{k} {(x_{k} | q^{n})}^{λ}}{\int_{x} p_{0}^{k} {(x_{k} | q^{*})}^{1 - λ} p_{1}^{k} {(x_{k} | q^{*})}^{λ} 2^{- D (q^{*} | | p)}} p (q^{n}) \end{matrix}

(57)

\begin{matrix} \overset{(a)}{\leq} \sum_{q^{n}} \prod_{k} \frac{\int_{x} p_{0}^{k} {(x_{k} | q^{n})}^{1 - λ} p_{1}^{k} {(x_{k} | q^{n})}^{λ}}{\int_{x} p_{0}^{k} {(x_{k} | q^{*})}^{1 - λ} p_{1}^{k} {(x_{k} | q^{*})}^{λ} 2^{- D (q^{*} | | p)}} 2^{- n D (q^{n} | | p)} \end{matrix}

(58)

\begin{matrix} = \sum_{q^{n}} \prod_{k} \frac{\int_{x} p_{0}^{k} {(x_{k} | q^{n})}^{1 - λ} p_{1}^{k} {(x_{k} | q^{n})}^{λ} 2^{- D (q^{n} | | p)}}{\int_{x} p_{0}^{k} {(x_{k} | q^{*})}^{1 - λ} p_{1}^{k} {(x_{k} | q^{*})}^{λ} 2^{- D (q^{*} | | p)}} \end{matrix}

(59)

\begin{matrix} \overset{(b)}{\leq} \sum_{q^{n}} 1 \overset{(c)}{\leq} {(n + 1)}^{m}, \end{matrix}

(60)

where

(a)

holds, as

p (q^{n}) \leq 2^{- n D (q^{n} | | p)}

[17,50], where

(b)

is due to the definition of

q^{*}

and

(c)

holds because for any n the number of types is upper-bounded by

{(n + 1)}^{m}

([50], Theorem 11.1.1). Then, taking the n-th root yields the upper bound

{[\frac{\sum_{q^{n}} \int_{x} p_{0} {(x | q^{n})}^{1 - λ} p_{1} {(x | q^{n})}^{λ} p (q^{n})}{\int_{x} p_{0} {(x | q^{*})}^{1 - λ} p_{1} {(x | q^{*})}^{λ} 2^{- n D (q^{*} | | p)}}]}^{\frac{1}{n}} \leq {(n + 1)}^{\frac{m}{n}} .

(61)

Observe that

{(n + 1)}^{\frac{m}{n}} \to 1

; thus,

\exists n_{1}

such that

\forall n \geq n_{1}

,

{(n + 1)}^{\frac{m}{n}} \leq 1 + ϵ

. Turning our attention to the lower bound,

\begin{matrix} \sum_{q^{n}} \prod_{k} \frac{\int_{x} p_{0}^{k} {(x_{k} | q^{n})}^{1 - λ} p_{1}^{k} {(x_{k} | q^{n})}^{λ}}{\int_{x} p_{0}^{k} {(x_{k} | q^{*})}^{1 - λ} p_{1}^{k} {(x_{k} | q^{*})}^{λ} 2^{- D (q^{*} | | p)}} p (q^{n}) \end{matrix}

(62)

\begin{matrix} \geq \sum_{q^{n} \in T_{o}^{n}} \prod_{k} \frac{\int_{x} p_{0}^{k} {(x_{k} | q^{n})}^{1 - λ} p_{1}^{k} {(x_{k} | q^{n})}^{λ}}{\int_{x} p_{0}^{k} {(x_{k} | q^{*})}^{1 - λ} p_{1}^{k} {(x_{k} | q^{*})}^{λ} 2^{- D (q^{*} | | p)}} p (q^{n}) \end{matrix}

(63)

\begin{matrix} \overset{(a)}{\geq} \frac{1}{{(n + 1)}^{m}} \sum_{q^{n} \in T_{o}^{n}} \prod_{k} \frac{\int_{x} p_{0}^{k} {(x_{k} | q^{n})}^{1 - λ} p_{1}^{k} {(x_{k} | q^{n})}^{λ}}{\int_{x} p_{0}^{k} {(x_{k} | q^{*})}^{1 - λ} p_{1}^{k} {(x_{k} | q^{*})}^{λ} 2^{- D (q^{*} | | p)}} 2^{- n D (q^{n} | | p)} \end{matrix}

(64)

\begin{matrix} = \frac{1}{{(n + 1)}^{m}} \sum_{q^{n} \in T_{o}^{n}} \prod_{k} \frac{\int_{x} p_{0}^{k} {(x_{k} | q^{n})}^{1 - λ} p_{1}^{k} {(x_{k} | q^{n})}^{λ} 2^{- D (q^{n} | | p)}}{\int_{x} p_{0}^{k} {(x_{k} | q^{*})}^{1 - λ} p_{1}^{k} {(x_{k} | q^{*})}^{λ} 2^{- D (q^{*} | | p)}} \end{matrix}

(65)

\begin{matrix} \overset{(b)}{\geq} \frac{1}{{(n + 1)}^{m}} \sum_{q^{n} \in T_{o}^{n}} {(1 - ϵ)}^{n} \overset{(c)}{\geq} \frac{1}{{(n + 1)}^{m}} {(1 - 1 + \sqrt{1 - ϵ})}^{n}, \end{matrix}

(66)

where

(a)

holds because

p (q^{n}) \geq \frac{1}{{(n + 1)}^{| S |}} 2^{- n D (q^{n} | | p)}

[17,50],

(b)

is due to the definition of

T_{o}^{n}

, and

(c)

holds because

T_{o}^{n}

is non-empty for

n \geq n_{o}

. Taking the n-th root provides

{[\frac{\sum_{q^{n}} \int_{x} p_{0} {(x | q^{n})}^{1 - λ} p_{1} {(x | q^{n})}^{λ} p (q^{n})}{\int_{x} p_{0} {(x | q^{*})}^{1 - λ} p_{1} {(x | q^{*})}^{λ} 2^{- n D (q^{*} | | p)}}]}^{\frac{1}{n}} \geq {(n + 1)}^{\frac{- m}{n}} (1 - 1 + \sqrt{1 - ϵ}) .

(67)

Observe that

{(n + 1)}^{\frac{- m}{n}} \to 1

; thus,

\exists n_{2}

such that

\forall n \geq n_{2}

,

{(n + 1)}^{\frac{- m}{n}} (1 - 1 + \sqrt{1 - ϵ}) \geq (1 - 1 + \sqrt{1 - ϵ}) (1 - 1 + \sqrt{1 - ϵ}) = 1 - ϵ

. Then, we can take

n_{ϵ} = max {n_{o}, n_{1}, n_{2}}

, meaning that for all

n \geq n_{ϵ}

we have

1 - ϵ \leq {[\frac{\sum_{q^{n}} \int_{x} p_{0} {(x | q^{n})}^{1 - λ} p_{1} {(x | q^{n})}^{λ} p (q^{n})}{\int_{x} p_{0} {(x | q^{*})}^{1 - λ} p_{1} {(x | q^{*})}^{λ} 2^{- n D (q^{*} | | p)}}]}^{\frac{1}{n}} \leq 1 + ϵ .

(68)

Because none of

n_{o}

,

n_{1}

, or

n_{2}

depend on

q^{*}

,

γ

, or

λ

, it is the case that

n_{ϵ}

does not depend on

q^{*}

,

γ

, or

λ

; hence, we have uniform convergence, which completes the proof. □

6.3. Proof of Theorem 2

Because we assume that agents of the same class use the same rule, if we focus on class c we have

p_{h} (x^{c} | q_{1}, \dots; q_{n_{c}}) = \prod_{k = 1}^{c_{k}} p^{c, k} (x_{c, k} | q_{1}^{n}, \dots; q_{n_{c}}^{n}),

(69)

which is a consequence of Equations (5) and (8). Then, we have

\begin{matrix} - c_{k} D (q_{c} | | p^{c}) + log \int_{x} p_{0} {(x^{c} | q_{1}, \dots; q_{n_{c}})}^{1 - λ} p_{1} {(x^{c} | q_{1}, \dots; q_{n_{c}})}^{λ} \\ = & \sum_{k = 1}^{c_{k}} - D (q_{c} | | p^{c}) + log \int_{x} p_{0}^{c, k} {(x | q_{1}, \dots; q_{n_{c}})}^{1 - λ} p_{1}^{c, k} {(x | q_{1}, \dots; q_{n_{c}})}^{λ}, \end{matrix}

(70)

with

p_{h}^{c, k} (x | q_{1}, \dots; q_{n_{c}}) = \sum_{u = 1}^{b} p^{c} (x | u, q_{1}, \dots; q_{n_{c}}) \int_{y} p^{c, k} (u | y) p_{h}^{c} (y | q_{1}, \dots; q_{n_{c}}), h \in {0, 1} .

(71)

If all agents in Class c use rule

γ^{c}

, then every term in the sum of Equation (70) is equal. Hence,

\begin{matrix} \frac{1}{n} \sum_{c = 1}^{n_{c}} - c_{k} D (q_{c} | | p^{c}) + log \int_{x} p_{0} {(x^{c} | q_{1}, \dots; q_{n_{c}})}^{1 - λ} p_{1} {(x^{c} | q_{1}, \dots; q_{n_{c}})}^{λ} \\ = & \sum_{c = 1}^{n_{c}} \frac{c_{k}}{n} [- D (q_{c} | | p^{c}) + log \int_{x} p_{0}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{1 - λ} p_{1}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{λ}] . \end{matrix}

(72)

We now turn our attention to the difference

\begin{matrix} \sum_{c = 1}^{n_{c}} r_{c} [- D (q_{c} | | p^{c}) & + log \int_{x} p_{0}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{1 - λ} p_{1}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{λ}] \\ - \sum_{c = 1}^{n_{c}} \frac{c_{k}}{n} [- D (q_{c} | | p^{c}) + log \int_{x} p_{0}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{1 - λ} p_{1}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{λ}], \end{matrix}

(73)

which is equivalent to

\sum_{c = 1}^{n_{c}} \frac{r_{c} n - ⌊ r_{c} n ⌋}{n} [- D (q_{c} | | p^{c}) + log \int_{x} p_{0}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{1 - λ} p_{1}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{λ}] .

(74)

Observe that

- D (q_{c} | | p^{c}) + log \int_{x} p_{0}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{1 - λ} p_{1}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{λ}] \leq 0,

(75)

for all classes, which is a consequence of the non-negativity of the KL divergence [50] and the non-positivity of the Chernoff information [44]. Combining this with the fact that

\frac{r_{c} n - ⌊ r_{c} n ⌋}{n} \geq 0

, we see that (74) is upper-bounded by zero. For a lower bound, observe that

\frac{r_{c} n - ⌊ r_{c} n ⌋}{n} \leq \frac{1}{n}

, which yields the result that Equation (74) is lower-bounded by

\begin{matrix} \sum_{c = 1}^{n_{c}} \frac{1}{n} [- D (q_{c} | | p^{c}) + log \int_{x} p_{0}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{1 - λ} p_{1}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{λ}] \end{matrix}

(76)

\begin{matrix} \geq \sum_{c = 1}^{n_{c}} \frac{1}{n} [- max_{q} D (q | | p^{c}) + inf_{γ} min_{λ \in [0, 1]} min_{q_{1}, \dots; q_{n_{c}}} log \int_{x} p_{0}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{1 - λ} p_{1}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{λ}] . \end{matrix}

(77)

The KL divergence (for finite alphabets) is bounded, and repeating the proof of Lemma 5 for the multi-class case using Assumptions 6.b and 6.c guarantees that the logarithm terms are finite. Hence, Equation (77) goes to zero as

n \to \infty

. Moreover, note that this lower bound is independent of the strategies

γ

and

λ

and the distributions

q_{1}, \dots; q_{n_{c}}

. This means that Equation (74) converges uniformly in

γ

and

λ

and the distributions

q_{1}, \dots; q_{n_{c}}

, which allows us to take the infimum, minimum, and maximum, respectively. To see this, observe that the upper bound provides

\begin{matrix} max_{q_{1}, \dots; q_{n_{c}}} & \sum_{c = 1}^{n_{c}} \frac{c_{k}}{n} [- D (q_{c} | | p^{c}) + log \int_{x} p_{0}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{1 - λ} p_{1}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{λ}] \\ \geq & \sum_{c = 1}^{n_{c}} \frac{c_{k}}{n} [- D (q_{c} | | p^{c}) + log \int_{x} p_{0}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{1 - λ} p_{1}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{λ}] \\ \geq & \sum_{c = 1}^{n_{c}} r_{c} [- D (q_{c} | | p^{c}) + log \int_{x} p_{0}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{1 - λ} p_{1}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{λ}] . \end{matrix}

(78)

As this is true for all

q_{1}, \dots; q_{n_{c}}

, we have

\begin{matrix} max_{q_{1}, \dots; q_{n_{c}}} \sum_{c = 1}^{n_{c}} \frac{c_{k}}{n} [- D (q_{c} | | p^{c}) + log \int_{x} p_{0}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{1 - λ} p_{1}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{λ}] \\ \geq & max_{q_{1}, \dots; q_{n_{c}}} \sum_{c = 1}^{n_{c}} r_{c} [- D (q_{c} | | p^{c}) + log \int_{x} p_{0}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{1 - λ} p_{1}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{λ}] . \end{matrix}

(79)

The same argument can be repeated to obtain

\begin{matrix} max_{q_{1}, \dots; q_{n_{c}}} \sum_{c = 1}^{n_{c}} r_{c} [- D (q_{c} | | p^{c}) + log \int_{x} p_{0}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{1 - λ} p_{1}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{λ}] \\ \geq & max_{q_{1}, \dots; q_{n_{c}}} \sum_{c = 1}^{n_{c}} \frac{c_{k}}{n} [- D (q_{c} | | p^{c}) + log \int_{x} p_{0}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{1 - λ} p_{1}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{λ}] \\ + & \sum_{c = 1}^{n_{c}} \frac{1}{n} [- max_{q} D (q | | p^{c}) + inf_{γ} min_{λ \in [0, 1]} min_{q_{1}, \dots; q_{n_{c}}} log \int_{x} p_{0}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{1 - λ} p_{1}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{λ}] . \end{matrix}

(80)

Hence, the difference

\begin{matrix} max_{q_{1}, \dots; q_{n_{c}}} \sum_{c = 1}^{n_{c}} r_{c} [- & D (q_{c} | | p^{c}) + log \int_{x} p_{0}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{1 - λ} p_{1}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{λ}] \\ - max_{q_{1}, \dots; q_{n_{c}}} \sum_{c = 1}^{n_{c}} \frac{c_{k}}{n} [- D (q_{c} | | p^{c}) + log \int_{x} p_{0}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{1 - λ} p_{1}^{c, 1} {(x | q_{1}, \dots; q_{n_{c}})}^{λ}], \end{matrix}

(81)

goes to zero as

n \to \infty

. Repeating the same argument with the minimum over

λ

followed by the infimum over

γ

completes the proof.

7. Conclusions

In this paper, we have introduced a new framework for decentralized inference that captures a high degree of coupling between the agents. Under our framework, the empirical distribution of the network state induces a global coupling across agents. We find an asymptotically equivalent expression to the Chernoff information and unveil a number of interesting properties, such as the fact that the true state distribution does not always dominate asymptotic performance. For the multi-class case, we characterize how ratios of classes of agents affect performance. We further allow for a lossy communication link between the agents and the fusion center and investigate the effects of the channel on overall performance. Our work extends prior work on distributed detection, and is able to break the requirement of conditionally independent observations when correlation is present. In future work, we will remove the fusion center from the system and require agents to directly communicate with each other, as in a purely decentralized ad hoc system. In addition, we will consider the introduction of actions by the agents which can affect observations by other agents to enable the consideration of active hypothesis testing in a distributed setting.

Author Contributions

J.S. and U.M. made serious contributions to the work and have had thorough discussions together on the problem formulation, proof techniques, and technical challenges. All authors have read and agreed to the published version of the manuscript.

Funding

This work has been funded in part by one or more of the following grants: NSF CCF-1817200, ARO W911NF1910269, DOE DE-SC0021417, Swedish Research Council 2018-04359, NSF CCF-2008927, NSF CCF-2200221, ONR 503400-78050, ONR N00014-15-1-2550, USC + Amazon Center on Secure and Trusted Machine Learning.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Appendix A.1. Proofs of Lemmas for Theorem 1

Proof of Lemma 3.

Fix a point

x_{o} \in D

. For

ϵ > 0

, let

δ > 0

(which can depend on

ϵ

and

x_{o}

) be such that

| f (x_{o}) - f (x) | < \frac{min {{inf}_{f \in F} f (x_{o}), 1} ϵ}{2}

for all

f \in F

whenever

| x_{o} - x | < δ

. Assume w.l.o.g. that

f (x_{o}) \geq f (x)

; then,

\begin{matrix} f {(x_{o})}^{λ} - f {(x)}^{λ} & = f {(x_{o})}^{λ} - f {(x)}^{λ} \frac{f {(x_{o})}^{1 - λ} + f {(x)}^{1 - λ}}{f {(x_{o})}^{1 - λ} + f {(x)}^{1 - λ}} \end{matrix}

(A1)

\begin{matrix} = \frac{f (x_{o}) - f (x) + f {(x_{o})}^{λ} f {(x)}^{1 - λ} - f {(x_{o})}^{1 - λ} f {(x)}^{λ}}{f {(x_{o})}^{1 - λ} + f {(x)}^{1 - λ}} \end{matrix}

(A2)

\begin{matrix} \overset{(a)}{\leq} \frac{2 (f (x_{o}) - f (x))}{f {(x_{o})}^{1 - λ}} \end{matrix}

(A3)

\begin{matrix} \overset{(b)}{\leq} \frac{2 (f (x_{o}) - f (x))}{min {f (x_{o}), 1}} \end{matrix}

(A4)

\begin{matrix} \leq \frac{2 (f (x_{o}) - f (x))}{min {{inf}_{f \in F} f (x_{o}), 1}} < ϵ, \end{matrix}

(A5)

where

(a)

follows from

(f {(x_{o})}^{λ} + f {(x)}^{λ}) (f {(x)}^{1 - λ} - f {(x_{o})}^{1 - λ}) \leq 0,

(A6)

and

(b)

follows from

{min}_{λ \in [0, 1]} f {(x_{o})}^{1 - λ} = f (x_{o})

if

f (x_{o}) < 1

and

{min}_{λ \in [0, 1]} f {(x_{o})}^{1 - λ} = 1

if

f (x_{o}) \geq 1

. Hence, we have just shown that for every

ϵ > 0

,

\exists δ_{F} (x_{o}) > 0

(independent of

λ

and f) such that

\forall λ \in [0, 1]

and

\forall f \in F

, we have

| f {(x_{o})}^{λ} - f (x) | < ϵ

whenever

| x_{o} - x | < δ_{f} (x_{o})

. A similar argument holds for

g^{1 - λ}

.

Then, for any

ϵ > 0

let

δ_{F} (x_{o})

and

δ_{G} (x_{o})

be such that

| f {(x_{o})}^{λ} - f {(x)}^{λ} | <

\frac{ϵ}{2 (max {{sup}_{g \in G} g (x_{o}), 1} + ϵ)}

and

| g {(x_{o})}^{1 - λ} - g {(x)}^{1 - λ} | < \frac{ϵ}{2 (max {{sup}_{f \in G} f (x_{o}), 1})}

for all

λ

f and g whenever

| x_{o} - x | < δ_{F} (x_{o})

and

| x_{o} - x | < δ_{G} (x_{o})

, respectively. Take

δ (x_{o}) = min {δ_{f} (x_{o}), δ_{g} (x_{o})}

; then, if

| x_{o} - x | < δ (x_{o})

for all

λ

we have

\begin{matrix} | f {(x_{o})}^{λ} g {(x_{o})}^{1 - λ} - f {(x)}^{λ} g {(x)}^{1 - λ} | = | f {(x_{o})}^{λ} g {(x_{o})}^{1 - λ} - f {(x_{o})}^{λ} g {(x)}^{1 - λ} + f {(x_{o})}^{λ} g {(x)}^{1 - λ} - f {(x)}^{λ} g {(x)}^{1 - λ} | \end{matrix}

(A7)

\begin{matrix} \overset{(a)}{\leq} f {(x_{o})}^{λ} | g {(x_{o})}^{1 - λ} - g {(x)}^{1 - λ} | + g {(x)}^{1 - λ} | f {(x_{o})}^{λ} - f {(x)}^{λ} | \end{matrix}

(A8)

\begin{matrix} \overset{(b)}{\leq} f {(x_{o})}^{λ} | g {(x_{o})}^{1 - λ} - g {(x)}^{1 - λ} | + (g {(x_{o})}^{1 - λ} + ϵ) | f {(x_{o})}^{λ} - f {(x)}^{λ} | \end{matrix}

(A9)

\begin{matrix} \leq f {(x_{o})}^{λ} \frac{ϵ}{2 (max {{sup}_{f \in F} f (x_{o}), 1})} + (g {(x_{o})}^{1 - λ} + ϵ) \frac{ϵ}{2 (max {{sup}_{g \in G} g (x_{o}), 1} + ϵ)} \end{matrix}

(A10)

\begin{matrix} \overset{(c)}{\leq} max {f (x_{o}), 1} \frac{ϵ}{2 (max {{sup}_{f \in F} f (x_{o}), 1})} + (max {g (x_{o}), 1} + ϵ) \frac{ϵ}{2 (max {{sup}_{g \in G} g (x_{o}), 1} + ϵ)} \end{matrix}

(A11)

\begin{matrix} \leq max {sup_{f \in F} f (x_{o}), 1} \frac{ϵ}{2 (max {{sup}_{f \in F} f (x_{o}), 1})} + (max {sup_{g \in G} g (x_{o}), 1} + ϵ) \frac{ϵ}{2 (max {{sup}_{g \in G} g (x_{o}), 1} + ϵ)} \end{matrix}

(A12)

\begin{matrix} = ϵ, \end{matrix}

(A13)

where

(a)

is due to the triangle inequality,

(b)

is due to

g {(x)}^{1 - λ} = g {(x)}^{1 - λ} - g {(x_{0})}^{1 - λ} + g {(x_{o})}^{1 - λ} < ϵ + g {(x_{o})}^{1 - λ}

, and

(c)

is due to the fact that

{max}_{λ \in [0, 1]} f {(x_{o})}^{1 - λ} = 1

if

f (x_{o}) < 1

and

{max}_{λ \in [0, 1]} f {(x_{o})}^{1 - λ} = f (x_{o})

if

f (x_{o}) \geq 1

. □

Proof of Lemma 5.

The function

g (x) = \sum_{u} {sup}_{q} p (x | u, q)

will always satisfy part (a). To see this, first observe that

\sum_{u} \int_{y} p (x | u, q) p (u | y) p_{h} (y | q) \leq \sum_{u} p (x | u, q) \leq \sum_{u} sup_{q} p (x | u, q) .

(A14)

for all h,

γ

, and

q

. Moreover, for each x and u, per Assumption 5.d the channel model is continuous in

q

, and because

P^{m}

is compact,

p (x | u, q)

attains its maximum, meaning that

{sup}_{q} p (x | u, q) = {max}_{q} p (x | u, q)

is a valid density; thus,

\int_{x} g (x) = \int_{x} \sum_{u} {max}_{q} p (x | u, q) = b < \infty

. For

b)

, repeated use of Hölder’s inequality provides

\begin{matrix} \int_{x} p_{0} {(x; q)}^{1 - λ} p_{1} {(x; q)}^{λ} \geq \int_{y} p_{0} {(y | q)}^{1 - λ} p_{1} {(y | q)}^{λ} \\ \overset{(a)}{\geq} \int_{y} min {p_{0} (y | q), p_{1} (y | q)} \geq \int_{y} inf_{q \in P^{m}} min {p_{0} (y | q), p_{1} (y | q)}, \end{matrix}

(A15)

where

(a)

holds, as for any two numbers a and b we have

a^{1 - λ} b^{λ} \geq min {a, b}

for any

λ \in [0, 1]

. Per Assumption 5.c, the signal model is continuous in

q

for h as well as all y; thus,

min {p_{0} (y | q), p_{1} (y | q)}

is continuous and attains its minimum. Moreover, we assume that

p_{h} (y | q)

is always strictly positive for any h, y, and

q

, meaning that

\int_{x} p_{0} {(x; q)}^{1 - λ} p_{1} {(x; q)}^{λ} \geq \int_{y} min_{q \in P^{m}} min {p_{0} (y | q), p_{1} (y | q)} > 0 .

(A16)

Because this lower bound holds for all

γ

,

λ

, and

q

, part

b)

holds. □

Proof of Lemma 6.

Fix

q \in P^{m}

and let

{α_{i}}

be any sequence that converges to

q

. Per Assumption 5.c,

p_{h} (y | α_{i}) \to p_{h} (y | q)

for each y. Moreover,

\int_{y} p_{h} (y | α_{i}) = \int_{y} p_{h} (y | q) = 1

for all i; thus, per Scheffé’s lemma [51],

\int_{y} | p_{h} (y | α_{i}) - p_{h} (y | q) | \to 0 .

(A17)

The remainder of the proof proceeds via proof by contradiction. Suppose that

\exists ϵ > 0

and that there exist two sequences

{α_{i}}

and

{β_{i}}

such that

α_{i} - β_{i} \to 0

and that for all i,

\int_{y} | p_{h} (y | α_{i}) - p_{h} (y | β_{i}) | > ϵ .

(A18)

Because

P^{m}

is bounded, per the Bolzano–Weierstrass theorem [52],

{α_{i}}

and

{β_{i}}

have convergent subsequences

{α_{i_{k}}}

and

{β_{i_{k}}}

that converge to some point

θ

. Because

P^{m}

is closed,

θ

must be in

P^{m}

. Then, Equation (A17) provides

\int_{y} | p_{h} (y | α_{i_{k}}) - p_{h} (y | β_{i_{k}}) | \leq \int_{y} | p_{h} (y | α_{i_{k}}) - p_{h} (y | θ) | + \int_{y} | p_{h} (y | β_{i_{k}}) - p_{h} (y | θ) | \to 0,

(A19)

which contradicts Equation (A18). □

Proof of Lemma 7.

Recall that Assumption 5.d states that

p (x | u, q)

is continuous in

q

for any x and

u \in {1, 2, \dots; b}

; thus,

{max}_{u} p (x | u, q)

is continuous in

q

for a fixed x. Moreover, because

P^{m}

is compact,

{max}_{u} p (x | u, q)

achieves its maximum on

P^{m}

; thus,

0 < {max}_{q \in Z} {max}_{u} p (x | u, q) < \infty

. Then, per Lemma 6, for any

ϵ > 0

\exists δ_{0} > 0

such that whenever

{∥ α - β ∥}_{2} < δ_{0}

we have

\int_{y} | p_{h} (y | α) - p_{h} (y | β) | < \frac{ϵ}{2 b {max}_{q \in Z} {max}_{u} p (x | u, q)}

. Then, for all u and

γ

we have

\begin{matrix} | p_{h} (u | α) - p_{h} (u | β) | = | \int_{y} p (u | y) p_{h} (y | α) - \int_{y} p (u | y) p_{h} (y | β) | \\ \overset{(a)}{\leq} \int_{y} p (u | y) | p_{h} (y | α) - p_{h} (y | β) | \overset{(b)}{\leq} \int_{y} | p_{h} (y | α) - p_{h} (y | β) | < \frac{ϵ}{2 b {max}_{q \in Z} {max}_{u} p (x | u, q)}, \end{matrix}

(A20)

where

(a)

holds due to the triangle inequality and

(b)

holds because

p (u | y) \leq 1

for all u and

γ

. Moreover, because

p (x | u, q)

is uniformly continuous in

q

for each

u \in {1, 2, \dots; b}

,

\exists δ_{u} > 0

(which depends only on

ϵ

, x, and u) such that whenever

{∥ α - β ∥}_{2} < δ_{u}

,

| p (x | u, α) - p (x | u, β) | < \frac{ϵ}{2 b}

. For a fixed x, take

δ = min {δ_{0}, δ_{1}, \dots; δ_{b}}

; thus,

δ

depends only on

ϵ

, h, and x, and does not depend on

γ

or u. Then, whenever

{∥ α - β ∥}_{2} < δ

,

\begin{matrix} | p_{h} (x | α) - p_{h} (x | β) | = | \sum_{u = 1}^{b} p (x | u, α) p_{h} (u | α) - \sum_{u = 1}^{b} p (x | u, β) p_{h} (u | β) | \end{matrix}

(A21)

\begin{matrix} \overset{(a)}{\leq} \sum_{u = 1}^{b} | p (x | u, α) p_{h} (u | α) - p (x | u, β) p_{h} (u | β) | \end{matrix}

(A22)

\begin{matrix} = \sum_{u = 1}^{b} | p (x | u, α) p_{h} (u | α) - p (x | u, α) p_{h} (u | β) + p (x | u, α) p_{h} (u | β) - p (x | u, β) p_{h} (u | β) | \end{matrix}

(A23)

\begin{matrix} \overset{(b)}{\leq} \sum_{u = 1}^{b} p (x | u, α) | p_{h} (u | α) - p_{h} (u | β) | + p_{h} (u | β) | p (x | u, α) - p (x | u, β) | \end{matrix}

(A24)

\begin{matrix} \overset{(c)}{\leq} \sum_{u = 1}^{b} | p_{h} (u | α) - p_{h} (u | β) | max_{q \in Z} max_{u} p (x | u, q) + | p (x | u, α) - p (x | u, β) | \end{matrix}

(A25)

\begin{matrix} \leq \sum_{u = 1}^{b} \frac{ϵ}{2 b {max}_{q \in Z} {max}_{u} p (x | u, q)} max_{q \in Z} max_{u} p (x | u, q) + \frac{ϵ}{2 b} = ϵ, \end{matrix}

(A26)

where

(a)

and

(b)

are due to the triangle inequality and

(c)

is due to

p_{h} (u | β) < 1

. Now, because

D (q | | p)

is continuous in

q

(for finite alphabets) and

P^{m}

is compact,

D (q | | p)

is uniformly continuous. Clearly,

D (q | | p)

does not depend on

γ

; hence, we may repeat the exact same argument with

p_{h} (x | q) 2^{- D (q | | p)}

, providing the desired result. □

Proof of Lemma 8.

In order to use Lemma 3, we need to show that for each

q \in P^{m}

and

h \in {0, 1}

it is the case that

{inf}_{γ} p_{h} (x | q) 2^{- D (q | | p)} > 0

and

{sup}_{γ} p_{h} (x | q) 2^{- D (q | | p)} < \infty

. Because

0 < 2^{- D (q | | p)} \leq 1

(due to

D (q | | p)

being bounded for finite alphabets), it suffices to show that

{inf}_{γ} p_{h} (x | q) > 0

and

{sup}_{γ} p_{h} (x | q) < \infty

. For any

γ

, observe that

p_{h} (x | q) = \sum_{u} p (x | u, q) p_{h} (u | q) \geq min_{u} p (x | u, q) \sum_{u} p_{h} (u | q) = min_{u} p (x | u, q) \overset{(a)}{>} 0,

(A27)

where

(a)

is due to Assumption 4. Taking the infimum over

γ

provides us with

{inf}_{γ} p_{h} (x | q)

\geq {min}_{u} p (x | u, q) > 0

. A similar argument shows that

{sup}_{γ} p_{h} (x | q) < \infty

. Then, Lemma 3 yields the desired assertion. □

Proof of Lemma 9.

We proceed via proof by contradiction. Suppose that

\exists ϵ > 0

and that we have sequences

{α_{i}}

,

{β_{i}}

, and

{λ_{i}}

such that

α_{i} - β_{i} \to 0

and

\int_{x} | p_{0} {(x | α_{i})}^{1 - λ_{i}} p_{1} {(x | α_{i})}^{λ_{i}} 2^{- D (α_{i} | | p)} - p_{0} {(x | β_{i})}^{1 - λ_{i}} p_{1} {(x | β_{i})}^{λ_{i}} 2^{- D (β_{i} | | p)} | \geq ϵ,

(A28)

for all i. Then, let

f_{i} (x) = | p_{0} {(x | α_{i})}^{1 - λ_{i}} p_{1} {(x | α_{i})}^{λ_{i}} 2^{- D (α_{i} | | p)} - p_{0} {(x | β_{i})}^{1 - λ_{i}} p_{1} {(x | β_{i})}^{λ_{i}} 2^{- D (β_{i} | | p)} |

. Per Lemma 8,

f_{i} \to 0

a.e.; then, in order to use Lemma 4 we must show the integrals

\int_{x} f_{i} (x)

are uniformly absolutely continuous, uniformly absolutely convergent, and

\int_{x} | f_{i} (x) | < \infty

for all i. Recall that per Lemma 5 there exists a function

g (x)

such that for all

q

and h we have

p_{h} (x | q) \leq g (x)

and

\int_{x} g (x) < \infty

. Then,

\begin{matrix} \int_{x} f_{i} (x) & = \int_{x} | p_{0} {(x | α_{i})}^{1 - λ_{i}} p_{1} {(x | α_{i})}^{λ_{i}} 2^{- D (α_{i} | | p)} - p_{0} {(x | β_{i})}^{1 - λ_{i}} p_{1} {(x | β_{i})}^{λ_{i}} 2^{- D (β_{i} | | p)} | \end{matrix}

(A29)

\begin{matrix} \overset{(a)}{\leq} \int_{x} | p_{0} {(x | α_{i})}^{1 - λ_{i}} p_{1} {(x | α_{i})}^{λ_{i}} 2^{- D (α_{i} | | p)} | + \int_{x} | p_{0} {(x | β_{i})}^{1 - λ_{i}} p_{1} {(x | β_{i})}^{λ_{i}} 2^{- D (β_{i} | | p)} | \end{matrix}

(A30)

\begin{matrix} \leq \int_{x} | p_{0} {(x | α_{i})}^{1 - λ_{i}} p_{1} {(x | α_{i})}^{λ_{i}} | + \int_{x} | p_{0} {(x | β_{i})}^{1 - λ_{i}} p_{1} {(x | β_{i})}^{λ_{i}} | \end{matrix}

(A31)

\begin{matrix} \overset{(b)}{\leq} \int_{x} max {p_{0} (x | α_{i}), p_{1} (x | α_{i})} + \int_{x} max {p_{0} (x | β_{i}), p_{1} (x | β_{i})} \end{matrix}

(A32)

\begin{matrix} \leq \int_{x} g (x) + \int_{x} g (x) = 2 \int_{x} g (x) < \infty, \end{matrix}

(A33)

where

(a)

is due to the triangle inequality,

(b)

holds due to

a^{λ} b^{1 - λ} \leq max {a, b}

for any two real numbers

a, b

, and

λ \in [0, 1]

. Furthermore, due to the absolute continuity of the Lebesgue integral,

\exists δ > 0

for any

ϵ > 0

such that for any measurable set

A

with Lebesgue measure

ν (A) < δ

we have

\int_{A} g (x) < \frac{ϵ}{2}

. Then, we have

\int_{A} f_{i} (x) \leq 2 \int_{A} g (x) < ϵ

(A34)

for all i, meaning that the integrals

\int_{x} f_{i} (x)

are uniformly absolutely continuous. Moreover, because

\int_{x} g (x) < \infty

, defining

I_{a} = [- a, a]

, we have

{lim}_{a \to \infty} \int_{I_{a}} g (x) = \int_{x} g (x)

from the Dominated Convergence theorem [52]. Let

\bar{I_{a}} = (- \infty, - a) \cup (a, \infty)

; then, for all

ϵ > 0

we have

\exists a_{o}

such that for all

a \geq a_{o}

we have

\int_{\bar{I_{a}}} g (x) < \frac{ϵ}{2}

, which provides

0 \leq \int_{x} f_{i} (x) - \int_{I_{a}} f_{i} (x) = \int_{\bar{I_{a}}} f_{i} (x) \leq 2 \int_{\bar{I_{a}}} g (x) < ϵ,

(A35)

for all i, making the integrals uniformly absolutely convergent. Then, per Lemma 4 we have

\int_{x} f_{i} (x) \to 0

, which contradicts Equation (A28). □

Proof of Lemma 10.

For any

ϵ > 0

, per Lemma 9 we have

\exists δ > 0

, which does not depend on γ, λ, or

q

, such that whenever

{∥ α - β ∥}_{2} < δ

we have

\begin{matrix} \int_{x} | p_{0} {(x | α)}^{1 - λ} p_{1} {(x | α)}^{λ} 2^{- D (α | | p)} - p_{0} {(x | β)}^{1 - λ} p_{1} {(x | β)}^{λ} 2^{- D (β | | p)} | \\ < ϵ inf_{γ} min_{λ \in [0, 1]} min_{q \in P^{m}} 2^{- D (q | | p)} \int_{x} p_{0} {(x | q)}^{1 - λ} p_{1} {(x | q)}^{λ}, \end{matrix}

(A36)

where the right side is strictly positive due to Lemma 5. This leads us to

\begin{matrix} | \frac{\int_{x} p_{0} {(x | α)}^{1 - λ} p_{1} {(x | α)}^{λ} 2^{- D (α | | p)}}{\int_{x} p_{0} {(x | β)}^{1 - λ} p_{1} {(x | β)}^{λ} 2^{- D (β | | p)}} - 1 | \end{matrix}

(A37)

\begin{matrix} = \frac{1}{\int_{x} p_{0} {(x | β)}^{1 - λ} p_{1} {(x | β)}^{λ} 2^{- D (β | | p)}} | \int_{x} p_{0} {(x | α)}^{1 - λ} p_{1} {(x | α)}^{λ} 2^{- D (α | | p)} - p_{0} {(x | β)}^{1 - λ} p_{1} {(x | β)}^{λ} 2^{- D (β | | p)} | \end{matrix}

(A38)

\begin{matrix} \leq \frac{1}{\int_{x} p_{0} {(x | β)}^{1 - λ} p_{1} {(x | β)}^{λ} 2^{- D (β | | p)}} \int_{x} | p_{0} {(x | α)}^{1 - λ} p_{1} {(x | α)}^{λ} 2^{- D (α | | p)} - p_{0} {(x | β)}^{1 - λ} p_{1} {(x | β)}^{λ} 2^{- D (β | | p)} | \end{matrix}

(A39)

\begin{matrix} \leq \frac{ϵ {inf}_{γ} {min}_{λ \in [0, 1]} {min}_{q \in P^{m}} 2^{- D (q | | p)} \int_{x} p_{0} {(x | q)}^{1 - λ} p_{1} {(x | q)}^{λ}}{\int_{x} p_{0} {(x | β)}^{1 - λ} p_{1} {(x | β)}^{λ} 2^{- D (β | | p)}} \end{matrix}

(A40)

\begin{matrix} \leq \frac{ϵ {inf}_{γ} {min}_{λ \in [0, 1]} {min}_{q \in P^{m}} 2^{- D (q | | p)} \int_{x} p_{0} {(x | q)}^{1 - λ} p_{1} {(x | q)}^{λ}}{{inf}_{γ} {min}_{λ \in [0, 1]} {min}_{q \in P^{m}} 2^{- D (q | | p)} \int_{x} p_{0} {(x | q)}^{1 - λ} p_{1} {(x | q)}^{λ}} = ϵ . \end{matrix}

(A41)

□

Proof of Lemma 11.

As long as the fusion center implements the MAP rule, the error exponent is characterized by the Chernoff information [27,44]. Moreover, assuming that the fusion center knows the network type, the Chernoff information becomes

min_{λ \in [0, 1]} \frac{1}{n} log \int_{x} \sum_{q^{n}} p_{0} {(x, q^{n})}^{1 - λ} p_{1} {(x, q^{n})}^{λ} .

(A42)

Then, for any strategy

γ

and

λ \in [0, 1]

,

\begin{matrix} \frac{1}{n} log \int_{x} \sum_{q^{n}} p_{0} {(x, q^{n})}^{1 - λ} p_{1} {(x, q^{n})}^{λ} \end{matrix}

(A43)

\begin{matrix} = \frac{1}{n} log \int_{x} \sum_{q^{n}} {(p_{0} (x | q^{n}) \frac{p (H = 0 | q^{n})}{P (H = 0)})}^{1 - λ} {(p_{1} (x | q^{n}) \frac{p (H = 1 | q^{n})}{P (H = 1)})}^{λ} p (q^{n}) . \end{matrix}

(A44)

Now, we have

\begin{matrix} \frac{1}{n} log \int_{x} \sum_{q^{n}} p_{0} {(x, q^{n})}^{1 - λ} p_{1} {(x, q^{n})}^{λ} \end{matrix}

(A45)

\begin{matrix} \leq - \frac{1}{n} log P (H = 0) - \frac{1}{n} log P (H = 1) + \frac{1}{n} log \int_{x} \sum_{q^{n}} p_{0} {(u | q^{n})}^{1 - λ} p_{1} {(u | q^{z})}^{λ} p (q^{n}) . \end{matrix}

(A46)

Because this argument holds for any

γ

and

λ \in [0, 1]

, we have

\begin{matrix} inf_{γ} min_{λ \in [0, 1]} \frac{1}{n} log \int_{x} \sum_{q^{n}} p_{0} {(x, q^{n})}^{1 - λ} p_{1} {(x, q^{n})}^{λ} - inf_{γ} min_{λ \in [0, 1]} \frac{1}{n} log \int_{x} \sum_{q^{n}} p_{0} {(u | q^{n})}^{1 - λ} p_{1} {(u | q^{z})}^{λ} p (q^{n}) \end{matrix}

(A47)

\begin{matrix} \leq - \frac{1}{n} log P (H = 0) - \frac{1}{n} log P (H = 1) \to 0 . \end{matrix}

(A48)

Turning our attention to the lower bound,

\begin{matrix} \frac{1}{n} log \int_{x} \sum_{q^{n}} {(p_{0} (x | q^{n}) \frac{p (H = 0 | q^{n})}{P (H = 0)})}^{1 - λ} {(p_{1} (x | q^{n}) \frac{p (H = 1 | q^{n})}{P (H = 1)})}^{λ} p (q^{n}) \end{matrix}

(A49)

\begin{matrix} \geq \frac{1}{n} log \int_{x} \sum_{q^{n}} {(p_{0} (x | q^{n}) p (H = 0 | q^{n}))}^{1 - λ} {(p_{1} (x | q^{n}) p (H = 1 | q^{n}))}^{λ} p (q^{n}) \end{matrix}

(A50)

\begin{matrix} \geq \frac{1}{n} log min_{q^{n}} {p (H = 0 | q^{n}), p (H = 1 | q^{n})} \int_{x} \sum_{q^{n}} p_{0} {(x | q^{n})}^{1 - λ} p_{1} {(x | q^{n})}^{λ} p (q^{n}) \end{matrix}

(A51)

\begin{matrix} = \frac{1}{n} log min_{q^{n}} {p (H = 0 | q^{n}), p (H = 1 | q^{n})} + \frac{1}{n} log \int_{x} \sum_{q^{n}} p_{0} {(x | q^{n})}^{1 - λ} p_{1} {(x | q^{n})}^{λ} p (q^{n}) \end{matrix}

(A52)

\begin{matrix} \geq \frac{1}{n} log min_{q^{n}} {p (H = 0 | q^{n}), p (H = 1 | q^{n})} + inf_{γ} min_{λ \in [0, 1]} \frac{1}{n} log \int_{x} \sum_{q^{n}} p_{0} {(x | q^{n})}^{1 - λ} p_{1} {(x | q^{n})}^{λ} p (q^{n}) . \end{matrix}

(A53)

Because this argument holds for any

γ

and

λ \in [0, 1]

, we have

\begin{matrix} inf_{γ} min_{λ \in [0, 1]} \frac{1}{n} log \int_{x} \sum_{q^{n}} p_{0} {(x, q^{n})}^{1 - λ} p_{1} {(x, q^{n})}^{λ} - inf_{γ} min_{λ \in [0, 1]} \frac{1}{n} log \int_{x} \sum_{q^{n}} p_{0} {(u | q^{n})}^{1 - λ} p_{1} {(u | q^{z})}^{λ} p (q^{n}) \end{matrix}

(A54)

\begin{matrix} \geq \frac{1}{n} log min_{q^{n}} {p (H = 0 | q^{n}), p (H = 1 | q^{n})} \to 0, \end{matrix}

(A55)

per Assumption 5.b. This completes the proof. □

Appendix A.2. Extension of Theorem 1

We begin by defining

(q_{1}^{*}, \dots; q_{n_{c}}^{*}) = arg max_{q_{1}, \dots; q_{n_{c}}} \frac{1}{n} \sum_{c = 1}^{n_{c}} - c_{k} D (q_{c} | | p^{c}) + log \int_{x} p_{0} {(x^{c} | q_{1}, \dots; q_{n_{c}})}^{1 - λ} p_{1} {(x^{c} | q_{1}, \dots; q_{n_{c}})}^{λ} .

(A56)

Then, for any

0 < ϵ < 1

, in order for the argument in the proof of Theorem 1 to hold, there must exist an

n_{c, o}

for each class such that the set

T_{c, δ}^{n} = {q_{c}^{n} \in Z^{n} : ∥ q_{c}^{n} - q_{c}^{*} ∥_{2} < δ}

(A57)

is non-empty for all

n \geq n_{c, o}

. Recall that we assume

{lim}_{n \to \infty} \frac{c_{k}}{n} > 0

for all classes, meaning that

n_{c, o}

are guaranteed to exist. This means that

\cap_{c} T_{c, δ}^{n}

is non-empty for all

n \geq {max}_{c} {n_{c, o}}

. Then, repeating the exact same argument as before for each class, we can show that there exists

n_{δ}

such that for all

n \geq n_{δ}

we have

\begin{matrix} 1 - ϵ \leq (1 - 1 + \sqrt{1 - ϵ}) \prod_{c = 1}^{n_{c}} {(c_{k} + 1)}^{\frac{- m}{n}} \leq \\ {[\sum_{q_{1}^{n}, \dots; q_{n_{c}}^{n}} \prod_{c = 1}^{n_{c}} \frac{\int_{x^{c}} p_{0} {(x^{c} | q_{1}^{n}, \dots; q_{n_{c}}^{n})}^{1 - λ} p_{1} {(x^{c} | q_{1}^{n}, \dots; q_{n_{c}}^{n})}^{λ} p (q_{c}^{n})}{\int_{x^{c}} p_{0} {(x^{c} | q^{*})}^{1 - λ} p_{1} {(x^{c} | q^{*})}^{λ} 2^{- n D (q^{*} | | p)}}]}^{\frac{1}{n}} \leq \prod_{c = 1}^{n_{c}} {(c_{k} + 1)}^{\frac{m}{n}} \leq 1 + ϵ . \end{matrix}

(A58)

References

Shanthamallu, U.S.; Spanias, A.; Tepedelenlioglu, C.; Stanley, M. A brief survey of machine learning methods and their sensor and IoT applications. In Proceedings of the 2017 8th International Conference on Information, Intelligence, Systems & Applications (IISA), Larnaca, Cyprus, 27–30 August 2017; pp. 1–8. [Google Scholar]
Tajer, A.; Kar, S.; Poor, H.V.; Cui, S. Distributed joint cyber attack detection and state recovery in smart grids. In Proceedings of the 2011 IEEE International Conference on Smart Grid Communications (SmartGridComm), Brussels, Belgium, 17–20 October 2011; pp. 202–207. [Google Scholar]
Patel, A.; Ram, H.; Jagannatham, A.K.; Varshney, P.K. Robust cooperative spectrum sensing for MIMO cognitive radio networks under csi uncertainty. IEEE Trans. Signal Process. 2018, 66, 18–33. [Google Scholar] [CrossRef]
Chawla, A.; Singh, R.K.; Patel, A.; Jagannatham, A.K.; Hanzo, L. Distributed detection for centralized and decentralized millimeter wave massive MIMO sensor networks. IEEE Trans. Veh. Technol. 2021, 70, 7665–7680. [Google Scholar] [CrossRef]
Geng, B.; Cheng, X.; Brahma, S.; Kellen, D.; Varshney, P.K. Collaborative human decision making with heterogeneous agents. IEEE Trans. Comput. Soc. Syst. 2022, 9, 469–479. [Google Scholar] [CrossRef]
Gupta, S.S.; Mehta, N.B. Ordered transmissions schemes for detection in spatially correlated wireless sensor networks. IEEE Trans. Commun. 2021, 69, 1565–1577. [Google Scholar] [CrossRef]
Gangan, M.S.; Vasconcelos, M.M.; Mitra, U.; Câmara, O.; Boedicker, J.Q. Intertemporal trade-off between population growth rate and carrying capacity during public good production. iScience 2022, 25, 104117. [Google Scholar] [CrossRef]
Tsitsiklis, J.; Athans, M. On the complexity of decentralized decision making and detection problems. IEEE Trans. Autom. Control 1985, 30, 440–446. [Google Scholar] [CrossRef]
Aalo, V.; Viswanathou, R. On distributed detection with correlated sensors: Two examples. IEEE Trans. Aerosp. Electron. Syst. 1989, 25, 414–421. [Google Scholar] [CrossRef]
Willett, P.; Swaszek, P.F.; Blum, R.S. The good, bad and ugly: Distributed detection of a known signal in dependent Gaussian noise. IEEE Trans. Signal Process. 2000, 48, 3266–3279. [Google Scholar]
Gül, G. Minimax robust decentralized hypothesis testing for parallel sensor networks. IEEE Trans. Inf. Theory 2020, 67, 538–548. [Google Scholar] [CrossRef]
Chen, H.; Chen, B.; Varshney, P.K. A new framework for distributed detection with conditionally dependent observations. IEEE Trans. Signal Process. 2012, 60, 1409–1419. [Google Scholar] [CrossRef]
Hanna, O.A.; Li, X.; Fragouli, C.; Diggavi, S. Can we break the dependency in distributed detection? In Proceedings of the 2022 IEEE International Symposium on Information Theory (ISIT), Espoo, Finland, 26 June–1 July 2022; pp. 2720–2725. [Google Scholar]
Kasasbeh, H.; Cao, L.; Viswanathan, R. Soft-decision-based distributed detection with correlated sensing channels. IEEE Trans. Aerosp. Electron. Syst. 2019, 55, 1435–1449. [Google Scholar] [CrossRef]
Maleki, N.; Vosoughi, A. On bandwidth constrained distributed detection of a known signal in correlated Gaussian noise. IEEE Trans. Veh. Technol. 2020, 69, 11428–114440. [Google Scholar] [CrossRef]
Shaska, J.; Mitra, U. State-dependent decentralized detection. IEEE Trans. Inf. Theory 2023. submitted. [Google Scholar]
Csiszar, I. The method of types [information theory]. IEEE Trans. Inf. Theory 1998, 44, 2505–2523. [Google Scholar] [CrossRef]
Raginsky, M. Empirical processes, typical sequences, and coordinated actions in standard borel spaces. IEEE Trans. Inf. Theory 2013, 59, 1288–1301. [Google Scholar] [CrossRef]
Schuurmans, M.; Patrinos, P. A general framework for learning-based distributionally robust mpc of markov jump systems. IEEE Trans. Autom. Control 2023, 68, 2950–2965. [Google Scholar] [CrossRef]
Haghifam, M.; Tan, V.Y.; Khisti, A. Sequential classification with empirically observed statistics. IEEE Trans. Inf. Theory 2021, 67, 3095–3113. [Google Scholar] [CrossRef]
Guo, F.R.; Richardson, T.S. Chernoff-type concentration of empirical probabilities in relative entropy. IEEE Trans. Inf. Theory 2021, 67, 549–558. [Google Scholar] [CrossRef]
Weinberger, N.; Merhav, N. The dna storage channel: Capacity and error probability bounds. IEEE Trans. Inf. Theory 2022, 68, 5657–5700. [Google Scholar] [CrossRef]
Lalitha, A.; Javidi, T.; Sarwate, A.D. Social learning and distributed hypothesis testing. IEEE Trans. Inf. Theory 2018, 64, 6161–6179. [Google Scholar] [CrossRef]
Inan, Y.; Kayaalp, M.; Telatar, E.; Sayed, A.H. Social learning under randomized collaborations. In Proceedings of the 2022 IEEE International Symposium on Information Theory (ISIT), Espoo, Finland, 26 June–1 July 2022; pp. 115–120. [Google Scholar]
Goetz, C.; Humm, B. Decentralized real-time anomaly detection in cyber-physical production systems under industry constraints. Sensors 2023, 23, 4207. [Google Scholar] [CrossRef]
Shaska, J.; Mitra, U. Decentralized decision making in multi-agent networks: The state-dependent case. In Proceedings of the 2021 IEEE Global Communcations Conference, Madrid, Spain, 7–11 December 2021. [Google Scholar]
Tsitsiklis, J.N. Decentralized detection by a large number of sensors. Math. Control Signals Syst. 1988, 1, 167–182. [Google Scholar] [CrossRef]
Chen, B.; Tong, L.; Varshney, P.K. Channel-aware distributed detection in wireless sensor networks. IEEE Signal Process. Mag. 2006, 23, 16–26. [Google Scholar] [CrossRef]
Duman, T.; Salehi, M. Decentralized detection over multiple-access channels. IEEE Trans. Aerosp. Electron. Syst. 1998, 34, 469–476. [Google Scholar] [CrossRef]
Liu, B.; Chen, B. Channel-optimized quantizers for decentralized detection in sensor networks. IEEE Trans. Inf. Theory 2006, 52, 3349–3358. [Google Scholar]
Gelfand, S.I.; Pinsker, M.S. Coding for channel with random parameters. Probl. Control Inf. Theory 1980, 9, 19–31. [Google Scholar]
Choudhuri, C.; Kim, Y.H.; Mitra, U. Causal state communication. IEEE Trans. Inf. Theory 2013, 59, 3709–3719. [Google Scholar] [CrossRef]
Miretti, L.; Kobayashi, M.; Gesbert, D.; De Kerret, P. Cooperative multiple-access channels with distributed state information. IEEE Trans. Inf. Theory 2021, 67, 5185–5199. [Google Scholar] [CrossRef]
Zhang, W.; Vedantam, S.; Mitra, U. Joint transmission and state estimation: A constrained channel coding approach. IEEE Trans. Inf. Theory 2011, 57, 7084–7095. [Google Scholar] [CrossRef]
Kobayashi, M.; Caire, G.; Kramer, G. Joint state sensing and communication: Optimal tradeoff for a memoryless case. In Proceedings of the 2018 IEEE International Symposium on Information Theory (ISIT), Vail, CO, USA, 17–22 June 2018; pp. 111–115. [Google Scholar]
Bross, S.I.; Lapidoth, A. The rate-and-state capacity with feedback. IEEE Trans. Inf. Theory 2018, 64, 1893–1918. [Google Scholar] [CrossRef]
Moon, J.; Park, J. Pattern-dependent noise prediction in signal-dependent noise. IEEE J. Sel. Areas Commun. 2001, 19, 730–743. [Google Scholar] [CrossRef]
Tsiatmas, A.; Baggen, C.P.; Willems, F.M.; Linnartz, J.P.M.; Bergmans, J.W. An illumination perspective on visible light communications. IEEE Commun. Mag. 2014, 52, 64–71. [Google Scholar] [CrossRef]
Kavcic, A.; Moura, J.M. Correlation-sensitive adaptive sequence detection. IEEE Trans. Magn. 1998, 34, 763–771. [Google Scholar] [CrossRef]
Hareedy, A.; Amiri, B.; Galbraith, R.; Dolecek, L. Non-binary ldpc codes for magnetic recording channels: Error floor analysis and optimized code design. IEEE Trans. Commun. 2016, 64, 3194–3207. [Google Scholar] [CrossRef]
Kuan, D.T.; Sawchuk, A.A.; Strand, T.C.; Chavel, P. Adaptive noise smoothing filter for images with signal-dependent noise. IEEE Trans. Pattern Anal. Mach. Intell. 1985, 2, 165–177. [Google Scholar] [CrossRef]
JMeola, J.; Eismann, M.T.; Moses, R.L.; Ash, J.N. Modeling and estimation of signal-dependent noise in hyperspectral imagery. Appl. Opt. 2011, 50, 3829–3846. [Google Scholar]
Michelusi, N.; Boedicker, J.; El-Naggar, M.Y.; Mitra, U. Queuing models for abstracting interactions in bacterial communities. IEEE J. Sel. Areas Commun. 2016, 34, 584–599. [Google Scholar] [CrossRef]
Shannon, C.E.; Gallager, R.G.; Berlekamp, E.R. Lower bounds to error probability for coding on discrete memoryless channels. I. Inf. Control 1967, 10, 65–103. [Google Scholar] [CrossRef]
Chamberland, J.-F.; Veeravalli, V. Asymptotic results for decentralized detection in power constrained wireless sensor networks. IEEE J. Sel. Areas Commun. 2004, 22, 1007–1015. [Google Scholar] [CrossRef]
Yin, D.; Chen, Y.; Kannan, R.; Bartlett, P. Byzantine-robust distributed learning: Towards optimal statistical rates. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 5650–5659. [Google Scholar]
Li, S.; Zhao, S.; Yang, P.; Andriotis, P.; Xu, L.; Sun, Q. Distributed consensus algorithm for events detection in cyber-physical systems. IEEE Internet Things J. 2019, 6, 2299–2308. [Google Scholar] [CrossRef]
Nielson, F. Revisiting Chernoff information with likelihood ratio exponential families. Entropy 2022, 24, 1400. [Google Scholar] [CrossRef] [PubMed]
Graves, L.M. The Theory of Functions of Real Variables; Courier Corporation: North Chelmsford, MA, USA, 2012. [Google Scholar]
Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley-Interscience: Hoboken, NJ, USA, 2006. [Google Scholar]
Williams, D. Probability with Martingales; Cambridge University Press: Cambridge, MA, USA, 1991. [Google Scholar]
Bartle, R.G.; Sherbert, D.R. Introduction to Real Analysis; Wiley: New York, NY, USA, 2000; Volume 2. [Google Scholar]

Figure 1. A set of n agents receive signals

Y_{k}

and states

S_{k}

. Each agent is characterized by a decision rule

γ_{k}

and sends a message

X_{k}

to the fusion center, which outputs

\hat{H}

. The empirical distribution of the states

Q_{S}^{n}

governs the behavior of the signals

Y_{k}

as well as the communication channels.

Figure 1. A set of n agents receive signals

Y_{k}

and states

S_{k}

. Each agent is characterized by a decision rule

γ_{k}

and sends a message

X_{k}

to the fusion center, which outputs

\hat{H}

. The empirical distribution of the states

Q_{S}^{n}

governs the behavior of the signals

Y_{k}

as well as the communication channels.

Figure 2. Three-class example with coupled signaling and state-dependent channels: (a) the optimal error exponent as a function of

ρ

, highlighting the importance of the channel on the overall system; (b) the optimal class ratios for

α = 150

(as

ρ

increases, Class 3 becomes less important to the overall system); and (c) the dominating distributions

α = 100

, which may be different from the true distributions.

Figure 2. Three-class example with coupled signaling and state-dependent channels: (a) the optimal error exponent as a function of

ρ

, highlighting the importance of the channel on the overall system; (b) the optimal class ratios for

α = 150

(as

ρ

increases, Class 3 becomes less important to the overall system); and (c) the dominating distributions

α = 100

, which may be different from the true distributions.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shaska, J.; Mitra, U. Joint Detection and Communication over Type-Sensitive Networks. Entropy 2023, 25, 1313. https://doi.org/10.3390/e25091313

AMA Style

Shaska J, Mitra U. Joint Detection and Communication over Type-Sensitive Networks. Entropy. 2023; 25(9):1313. https://doi.org/10.3390/e25091313

Chicago/Turabian Style

Shaska, Joni, and Urbashi Mitra. 2023. "Joint Detection and Communication over Type-Sensitive Networks" Entropy 25, no. 9: 1313. https://doi.org/10.3390/e25091313

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Joint Detection and Communication over Type-Sensitive Networks

Abstract

1. Introduction

Notation

2. Materials and Methods

3. Problem Formulation, Definitions, and Assumptions

3.1. Problem Setup

3.2. Definitions

3.3. Key Assumptions

4. Main Results and Important Corollaries

4.1. Single-Class Results

4.2. Multi-Class Results

5. Numerical Example

6. Proofs

6.1. Proof of Theorem 1

6.1.1. Definitions

6.1.2. Key Lemmas

6.2. Intermediate Lemmas

6.3. Proof of Theorem 2

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix A.1. Proofs of Lemmas for Theorem 1

Appendix A.2. Extension of Theorem 1

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI