Utility–Privacy Trade-Offs with Limited Leakage for Encoder

Shinohara, Naruki; Yagi, Hideki

doi:10.3390/e25060921

Open AccessArticle

Utility–Privacy Trade-Offs with Limited Leakage for Encoder

by

Naruki Shinohara

^† and

Hideki Yagi

^*,†

Department of Computer and Network Engineering, The University of Electro-Communications, 1-5-1 Chofugaoka, Chofu 182-8585, Tokyo, Japan

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Entropy 2023, 25(6), 921; https://doi.org/10.3390/e25060921

Submission received: 10 April 2023 / Revised: 28 May 2023 / Accepted: 6 June 2023 / Published: 11 June 2023

(This article belongs to the Special Issue Advances in Information and Coding Theory)

Download

Browse Figures

Versions Notes

Abstract

:

The utilization of databases such as IoT has progressed, and understanding how to protect the privacy of data is an important issue. As pioneering work, in 1983, Yamamoto assumed the source (database), which consists of public information and private information, and found theoretical limits (first-order rate analysis) among the coding rate, utility and privacy for the decoder in two special cases. In this paper, we consider a more general case based on the work by Shinohara and Yagi in 2022. Introducing a measure of privacy for the encoder, we investigate the following two problems: The first problem is the first-order rate analysis among the coding rate, utility, privacy for the decoder, and privacy for the encoder, in which utility is measured by the expected distortion or the excess-distortion probability. The second task is establishing the strong converse theorem for utility–privacy trade-offs, in which utility is measured by the excess-distortion probability. These results may lead to a more refined analysis such as the second-order rate analysis.

Keywords:

utility–privacy trade-offs; source coding; Shannon theory; strong converse theorem

1. Introduction

1.1. Background

The utilization of database has progressed in our society and includes autonomous cars and the congestion data service over the Internet. At the same time, the risk of accidental or intentional leakage of private information has also increased rapidly. To protect private information, coding with a privacy constraint has been analyzed via an information-theoretic approach. In 1983, Yamamoto [1] introduced a framework to quantify the utility of databases and the privacy of personal information and analyzed the trade-offs between them. Decades later, in 2013, Sankar et al. [2] claimed the necessity of converting databases to protect privacy while maintaining the utility of data. Then, Yamamoto’s framework [1] was re-recognized by Sankar et al. and other researchers. Using the rate-distortion theory in information theory, he revealed the optimal relationships (theoretical limits) among coding rate, utility, and privacy in two cases; (i) public information that can be open to the public and private information that should be protected from a third party are encoded, and (ii) only public information is encoded. However, since a more general case, i.e., where (iii) public information and a part of private information is encoded, had not been clarified, Shinohara and Yagi [3] derived the theoretical limits in such a case (see Figure 1). As a result, our characterization of the achievable region gives a “unified expression” because it includes the characteristics given in [1] in cases (i) and (ii) as special cases.

1.2. Motivation and Contributions

By investigating case (iii), one can compare the theoretical limits corresponding to a variety of patterns of the encoded information. One can see that the achievable region in case (i) is the largest among all patterns. However, this may not be the case if privacy leakage for the encoder is constrained. Motivated by this observation, in this paper, we characterize the optimal trade-offs among coding rate, utility, privacy for the decoder, and privacy for the encoder in Section 3. The addressed problem corresponds to the case where there are some aggregators between the source and the encoder and the aggregator controls the data (source sequence) passing to the encoder. The obtained results indeed suggest that the best-encoded information can be in case (iii) if some restriction is imposed on the privacy leakage for the encoder.

One of the most important tasks in information-theoretic analysis for utility–privacy trade-offs is second-order rate analysis (e.g., [4,5,6]). In general, in second-order rate analysis, the excess-distortion probability is used as a measure of utility [4,5,6]. However, in the first-order rate analysis shown in [3], utility is measured by the expected distortion, so for second-order rate analysis, we need first to conduct first-order rate analysis, which replaces the expected distortion with the excess-distortion probability as the measure of utility. In Section 4, the theoretical limits coincide with the one in which utility is measured by expected distortion.

There is one more problem to solve before tackling second-order rate analysis: we need to clarify whether the boundary of the achievable region may vary or not, depending on the value of the excess-distortion probability. In Section 5, we establish the strong converse theorem, provided that utility is measured by the probability of excess distortion. For the sake of simplicity, we focus on the achievable region of utility and privacy for the decoder or a third party, which reveals an aspect of utility–privacy trade-offs. In the proof, we adopt a change in measure argument developed by Tyagi and Watanabe [7]. Contrary to the standard rate-distortion problem, the alphabets of the encoder’s input and the decoder’s output are different, so we extend the argument to incorporate this discrepancy. Although the strong converse theorem is shown for the rate region of utility and privacy, we can also derive the same result when the privacy of the encoder is involved.

For readers’ convenience, Figure 2 shows the road map to the most important task: the second-order rate analysis. In summary, three contributions of this paper are as follows:

The rate analysis among the coding rate, utility, privacy for the decoder, and privacy for the encoder in which utility is measured using the expected distortion (Section 3).
The rate analysis among the coding rate, utility, privacy for the decoder, and privacy for the encoder in which utility is measured using the excess-distortion probability (Section 4).
The strong converse theorem for utility–privacy trade-offs in which utility is measured using the excess-distortion probability (Section 5).

1.3. Related Work

The analysis of the utility–privacy trade-offs using an information-theoretic approach was initiated by [2], which translates the rate-distortion problem with an equivocation constraint in [1] into the privacy and utility trade-off problem. In information-theoretic studies on coding with privacy and utility constraints, several measures for privacy and utility are adopted. One of the strong measures for privacy is differential privacy [8,9], and an extension and relaxation of differential privacy have been proposed in [10,11]. A weaker but useful privacy measure is the mutual information between the codeword and private information [1,2,12,13,14], which guarantees the average amount of leaked private information. Other examples of well-known privacy measures are maximal leakage [15], maximal

α

-leakage [16,17,18], and total variation [19]. Relationships among several measures for privacy have been revealed in [20]. On the other hand, well-known utility measures are average distortion [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25], hard distortion [16,17], and log-loss distortion [26].

Coding systems in the utility–privacy problem are extended to the ones with the encoder’s side information [2] and with the decoder’s side information [25]. In [14], a related coding problem has been investigated, where both the encoder and the decoder can access a uniform secret key and the decoder can also access side information. Utility–privacy trade-off schemes are applied, for example, to the Internet of Energy [23] and to a system with informational self-determination [24].

A closely related study to this paper was given by Basciftci et al. [13], in which several release mechanisms of encoded information from the database were discussed. In particular, utility–privacy trade-offs (without the coding rate) were compared when the encoded information was (i) both private and public information, (ii) only public information, and (iv) only private information (see also the three cases described in Section 1.1). A sufficient condition under which the utility–privacy trade-offs coincide for cases (i) and (ii) was given.

1.4. Organization

This paper is organized as follows: In Section 2, we begin by introducing the notation and system model that are used in this paper. In Section 3, we give the first-order rate analysis among the coding rate, utility, privacy for the decoder, and privacy for the encoder in which utility is measured by the expected distortion. In Section 4, we tackle the first-order rate analysis among the coding rate, utility, privacy for the decoder, and privacy for the encoder in which utility is measured by the excess-distortion probability. Section 5 focuses on the strong converse theorem for utility–privacy trade-offs in which utility is measured by the excess-distortion probability. In Section 6, we discuss the significance of the encoded information with limited leakage for the encoder. Finally, in Section 7, the conclusion and future work are stated.

2. Notation and System Model

2.1. Information Source

Database d is described by a

K \times n

matrix whose rows represent K attributes and columns represent n entries of data. Let

K = {1, 2, \dots, K}

be the set of indexes of K attributes. The random variable for the lth attribute is denoted by

X_{l}

, which takes a value in a finite alphabet

X_{l}

. For any subset

B \subseteq K

, the tuple of random variables

{(X_{l})}_{l \in B}

is abbreviated as

X_{B}

. Similarly, the Cartesian product of alphabets

\prod_{l \in B} X_{l}

is abbreviated as

X_{B}

.

The K attributes can be divided into two groups; one may be open to the public and the other should be kept secret from a third party. Then, the set

K

is divided into disjoint sets

R

and

H

. That is,

\begin{matrix} K = R \cup H, R \cap H = \emptyset, X_{K} = X_{R} \times X_{H}, \end{matrix}

(1)

where

X_{R}

is the set of values that public (revealed) source symbols

X_{R}

take and

X_{H}

is the set of values that private (hidden) source symbols

X_{H}

take.

We assume that the source sequence

X_{K}^{n} = (X_{K, 1}, X_{K, 2}, \dots, X_{K, n})

is generated from a stationary and memoryless source

p_{X_{K}}

. That is,

\begin{matrix} P_{X_{K}^{n}} (x_{K}^{n}) = Pr {X_{K}^{n} = x_{K}^{n}} = \prod_{i = 1}^{n} P_{X_{K}} (x_{K, i}), \end{matrix}

(2)

where

x_{K}^{n} = (x_{K, 1}, \dots, x_{K, n}) \in X_{K}^{n}

. Taking the partition of attributes in (1) into account, the source sequence

X_{K}^{n}

is described as

\begin{matrix} X_{K}^{n} = (X_{R}^{n}, X_{H}^{n}), \end{matrix}

(3)

where

\begin{matrix} X_{R}^{n} = (X_{R, 1}, X_{R, 2}, \dots, X_{R, n}) \in X_{R}^{n}, \end{matrix}

(4)

\begin{matrix} X_{H}^{n} = (X_{H, 1}, X_{H, 2}, \dots, X_{H, n}) \in X_{H}^{n} \end{matrix}

(5)

are referred to as the revealed source sequence and the hidden source sequence, respectively. In the addressed coding system introduced in [22], the revealed symbols and a part of the hidden symbols are input to the encoder, and thus the encoded alphabet

E

satisfies

R \subseteq E \subseteq K

. Similar to (3),

X_{K}^{n}

is sometimes described as

\begin{matrix} X_{K}^{n} = (X_{E}^{n}, X_{E^{c}}^{n}), \end{matrix}

(6)

where

X_{E}^{n}

is the source sequence observed by the encoder and

E^{c} = K ∖ E

.

2.2. Encoder and Decoder

The coding system consists of encoder

f_{n}

and decoder

g_{n}

as in Figure 1. When the source sequence

X_{K}^{n} = (X_{E}^{n}, X_{E^{c}}^{n})

is generated from the stationary and memoryless source

p_{X_{K}}

, the codeword

J_{n} = f_{n} (X_{E}^{n})

is generated by the encoder

\begin{matrix} f_{n} : X_{E}^{n} \to {1, 2, \dots, M_{n}} \end{matrix}

(7)

and the reproduced sequence

{\hat{X}}_{R}^{n} = g_{n} (J_{n})

is produced by decoder

\begin{matrix} g_{n} : {1, 2, \dots, M_{n}} \to {\hat{X}}_{R}^{n}, \end{matrix}

(8)

where

M_{n}

denotes the number of codewords.

3. First-Order Rate Analysis with Expected Distortion

3.1. Performance Measures

In this section, we mention the measure of the coding rate, utility, privacy for the decoder, and privacy for the encoder. Hereafter, let a pair of the encoder and decoder

(f_{n}, g_{n})

be fixed.

For a given

M_{n}

, the coding rate is defined as

\begin{matrix} r_{n} ≔ \frac{1}{n} \log M_{n} . \end{matrix}

(9)

Let

d : X_{R} \times {\hat{X}}_{R} \to [0, \infty)

be a distortion function between

x_{R} \in X_{R}

and

{\hat{x}}_{R} \in {\hat{X}}_{R}

. The distortion between sequences

x_{R}^{n} \in X_{R}^{n}

and

{\hat{x}}_{R}^{n} \in {\hat{X}}_{R}^{n}

is defined as

\begin{matrix} d (x_{R}^{n}, {\hat{x}}_{R}^{n}) ≔ & \sum_{i = 1}^{n} d (x_{R, i}, {\hat{x}}_{R, i}) . \end{matrix}

(10)

Then, the measure of utility is defined as

\begin{matrix} u_{n} ≔ & E [\frac{1}{n} d (X_{R}^{n}, {\hat{X}}_{R}^{n})], \end{matrix}

(11)

where

E

represents the expectation by the joint distribution of

(X_{R}^{n}, {\hat{X}}_{R}^{n})

.

In this system, the privacy of the hidden source sequence

X_{H}^{n}

should be protected when the codeword

J_{n}

is observed by decoder

g_{n}

. The measure of privacy for the decoder is defined as

\begin{matrix} l_{n} ≕ \frac{1}{n} I (X_{H}^{n}; J_{n}), \end{matrix}

(12)

where

I (X_{H}^{n}; J_{n})

is the mutual information between

X_{H}^{n}

and

J_{n}

.

The privacy of the hidden source sequence

X_{H}^{n}

should be protected when the encoded information

X_{E}

is observed by encoder

f_{n}

. The measurement of privacy for the encoder is defined as

\begin{matrix} e_{n} ≔ \frac{1}{n} I (X_{H}^{n}; X_{E}^{n}), \end{matrix}

(13)

where

I (X_{H}^{n}; X_{E}^{n})

is the mutual information between

X_{H}^{n}

and

X_{E}^{n}

.

3.2. Achievable Region and Theorem

We define the achievable region for the first-order rate analysis with the expected distortion and state the obtained results.

Definition 1.

A tuple

(R, D, L, E)

is said to be

ϵ

-achievable (with respect to the expected distortion measure) if, for any given

ϵ > 0

, there exists a sequence of codes

(f_{n}, g_{n})

satisfying

\begin{matrix} r_{n} & \leq R + ϵ, \end{matrix}

(14)

\begin{matrix} u_{n} & \leq D + ϵ, \end{matrix}

(15)

\begin{matrix} l_{n} & \leq L + ϵ, \end{matrix}

(16)

\begin{matrix} e_{n} & \leq E + ϵ \end{matrix}

(17)

for all sufficiently large n.

The technical meanings of each constraint in Definition 1 can be interpreted as follows: Equation (14) evaluates how much the source sequence is compressed, so this rate should be decreased. Equation (15) is the constraint corresponding to distortion being less than

D + ϵ

. The smaller the distortion is, the better the utility is, so this condition should also be decreased. Equation (16) constrains the amount of leaked private information to the decoder. Since private information should be kept secret for the receiver, this quantity should be decreased as well. Equation (17) constrains the amount of private information leaked to the encoder. For the same reason as (16), this quantity should also be decreased.

Remark 1.

The minimum coding rate R for a fixed D corresponds to the rate-distortion function (Section 10 in [27]). Thus, in the proof of achievability, we evaluate the coding rate and the distortion with the argument in rate-distortion theory. This view is also important to correctly understand the numerical results in Section 6.1.

Definition 2.

The closure of the set of ϵ-achievable tuples

(R, D, L, E)

is referred to as the

ϵ

-achievable region and is denoted by

C_{E} (ϵ | P_{X_{K}})

and defines

\begin{matrix} C_{E} (P_{X_{K}}) ≔ ⋂_{0 < ϵ < 1} C_{E} (ϵ | P_{X_{K}}) . \end{matrix}

(18)

To characterize the achievable region, we define the following informational region.

Definition 3.

For any

E

such that

R \subseteq E \subseteq K

,

S_{E} (P_{X_{K}})

is defined as

\begin{matrix} S_{E} (P_{X_{K}}) = {(R, D, L, E) : & R \geq I (X_{E}; {\hat{X}}_{R}), \\ D \geq E [d (X_{R}, {\hat{X}}_{R})], \\ L \geq I (X_{H}; {\hat{X}}_{R}), \\ E \geq I (X_{H}; X_{E}) \\ for some P_{X_{E}, X_{E^{c}}} \cdot P_{{\hat{X}}_{R} | X_{E}}} . \end{matrix}

(19)

We establish the next theorem. For the proof of this theorem, please refer to Section 3.3, Section 3.4 and Section 3.5.

Theorem 1.

For any

E

such that

R \subseteq E \subseteq K

, the achievable region of the coding system is given by

\begin{matrix} C_{E} (P_{X_{K}}) = S_{E} (P_{X_{K}}) . \end{matrix}

(20)

To clarify the relationship with the conventional result of Shinohara and Yagi [3], we mention the achievable region among the coding rate, utility, and privacy, which is derived by projecting the result of Theorem 1 onto the

R - D - L

hyperplane.

Definition 4.

For any

E

such that

R \subseteq E \subseteq K

, we define

\begin{matrix} C_{E}^{R D L} (ϵ | P_{X_{K}}) ≔ {(R, D, L) : (R, D, L, E) \in C_{E} (ϵ | P_{X_{K}})} \end{matrix}

(21)

and

\begin{matrix} C_{E}^{R D L} (P_{X_{K}}) ≔ ⋂_{0 < ϵ < 1} C_{E}^{R D L} (ϵ | P_{X_{K}}) . \end{matrix}

(22)

Definition 5.

For any

E

such that

R \subseteq E \subseteq K

, we define

\begin{matrix} S_{E}^{R D L} (P_{X_{K}}) = {(R, D, L) : & R \geq I (X_{E}; {\hat{X}}_{R}), \\ D \geq E [d (X_{R}, {\hat{X}}_{R})], \\ L \geq I (X_{H}; {\hat{X}}_{R}) \\ for some P_{X_{E}, X_{E^{c}}} \cdot P_{{\hat{X}}_{R} | X_{E}}} . \end{matrix}

(23)

Corollary 1.

For any

E

such that

R \subseteq E \subseteq K

, the region

C_{E}^{R D L} (P_{X_{K}})

is given by

\begin{matrix} C_{E}^{R D L} (P_{X_{K}}) = S_{E}^{R D L} (P_{X_{K}}) . \end{matrix}

(24)

Remark 2.

Corollary 1 suggests that the conventional result [3] can be obtained from

C_{E} (P_{X_{K}})

.

Remark 3.

The derived characterization in (24) reduces to the characterization given in [1] when the encoded attribute

E

is either

K

or

R

. Thus, (24) gives its generalization for

R \subseteq E \subseteq K

.

Examples to illustrate this result are shown in Section 6.1.

3.3. Proof Preliminaries for First-Order Rate Analysis

For preliminaries for coding theorems by the first-order rate analysis, we define strongly typical sequences that are necessary for the proof and show some properties. These proof preliminaries are also used in Section 4.

Definition 6

(Definition 2.1, [28]). The type of a sequence

x^{n} \in X^{n}

of length n is the distribution

P_{x^{n}}

on

X

defined by

\begin{matrix} P_{x^{n}} (a) ≔ \frac{1}{n} N (a | x^{n}), \end{matrix}

(25)

where

N (a | x^{n})

represents the number of occurrences of symbol

a \in X

in

x^{n}

. Likewise, the joint type of

x^{n} \in X^{n}

and

y^{n} \in Y^{n}

is the distribution

P_{x^{n} y^{n}}

on

X \times Y

defined by

\begin{matrix} P_{x^{n} y^{n}} ≔ \frac{1}{n} N (a, b | x^{n}, y^{n}), \end{matrix}

(26)

where

N (a, b | x^{n}, y^{n})

represents the number of the occurrences of

(a, b) \in X \times Y

in the pair of sequences

(x^{n}, y^{n})

.

Definition 7

((Conditional Type), [28], Definition 2.2). We define the conditional type of

y^{n}

given

x^{n}

as a stochastic matrix

V : X \to Y

satisfying

\begin{matrix} N (a, b | x^{n}, y^{n}) = N (a | x^{n}) V (b | a) . \end{matrix}

(27)

In particular, the conditional type of

y^{n}

given

x^{n}

is uniquely determined and given by

\begin{matrix} V (b | a) = \frac{N (a, b | x^{n}, y^{n})}{N (a | x^{n})} \end{matrix}

(28)

if

N (a | x^{n}) > 0

for any

a \in X

.

Definition 8

((Strongly Typical Sequences), [29], Definition 1.2.8). For any distribution P on

X

, a sequence

x^{n} \in X^{n}

is said to be P-typical with constant

δ > 0

if

\begin{matrix} |\frac{1}{n} N (a | x^{n}) - P (a)| \leq δ for every a \in X \end{matrix}

(29)

and, in addition, no

a \in X

with

P (a) = 0

occurs in

x^{n}

. The set of such sequences is denoted by

T_{δ}^{n} (P)

. If X is a random variable with values in

X

, we also refer to P-typical sequences as X-typical sequences and write

T_{δ}^{n} (X)

.

Definition 9

((Conditional Strongly Typical Sequences), [29], Definition 1.2.9). For a stochastic matrix

W : X \to Y

, a sequence

y^{n} \in Y^{n}

is said to be W-typical given

x^{n} \in X^{n}

with constant

δ > 0

if

\begin{matrix} |\frac{1}{n} N (a, b | x^{n}, y^{n}) - \frac{1}{n} N (a | x^{n}) W (b | a)| \leq δ \\ for every a \in X, b \in Y, \end{matrix}

(30)

and, in addition,

N (a, b | x^{n}, y^{n}) = 0

whenever

W (b | a) = 0

. The set of such sequences

y^{n}

is denoted by

T_{δ}^{n} (W | x^{n})

. Further, if X and Y are random variables with values in

X

and

Y

, respectively, and

P_{Y | X} = W

, then they are also said to be

Y | X

-typical and written as

T_{δ}^{n} (Y | X | x^{n})

.

Hereafter, the set of conditional strongly typical sequences

T_{δ}^{n} (Y | X | x^{n})

is abbreviated as

T_{δ}^{n} (Y | x^{n})

.

We state some lemmas that are used in this proof.

Lemma 1

([29], Lemma 1.2.13). For any positive sequences

{δ_{n}}_{n = 1}^{\infty}

and

{δ_{n}^{'}}_{n = 1}^{\infty}

such that

δ_{n} \to 0

and

δ^{'} \to 0

as

n \to 0

, there exists a sequence

ϵ_{n} = ϵ_{n} (| X, Y |, δ_{n}, δ_{n}^{'}) \to 0

(n \to \infty)

such that for every distribution P on

X

and stochastic matrix

W : X \to Y

,

\begin{matrix} |\frac{1}{n} \log | T_{δ_{n}}^{n} (P) | - H (P)| & \leq ϵ_{n}, \end{matrix}

(31)

\begin{matrix} |\frac{1}{n} \log | T_{δ_{n}^{'}}^{n} (W | x^{n}) | - H (W | P)| & \leq ϵ_{n} . \end{matrix}

(32)

Lemma 2

([29], Lemma 1.2.7). Let the variational distance between two distributions P and Q on

X

be defined as

\begin{matrix} d_{v} (P, Q) ≔ \sum_{x \in X} | P (x) - Q (x) | . \end{matrix}

(33)

If

d_{v} (P, Q) < \frac{1}{2}

, then

\begin{matrix} | H (P) - H (Q) | \leq - d_{v} (P, Q) \cdot \log \frac{d_{v} (P, Q)}{| X |} . \end{matrix}

(34)

Lemma 3

([29], Lemma 1.2.10). If

x^{n} \in T_{δ}^{n} (X)

and

y^{n} \in T_{δ^{'}}^{n} (Y | x^{n})

, then

(x^{n}, y^{n}) \in T_{δ + δ^{'}}^{n} (X, Y)

and, consequently,

y^{n} \in T_{δ^{″}} (Y)

for

δ^{″} ≔ (δ + δ^{'}) \cdot | X |

.

Lemma 4.

If

(x^{n}, y^{n}) \in T_{δ}^{n} (X, Y)

, then

x^{n} \in T_{δ_{1}}^{n} (X)

and, consequently,

y^{n} \in T_{δ_{2}}^{n} (Y | x^{n})

for

δ_{1} ≔ | Y | \cdot δ

and

δ_{2} ≔ (| Y | + 1) \cdot δ

.

Lemma 5.

If

y^{n} \in T_{δ}^{n} (Y)

and

(x^{n}, y^{n}) \notin T_{2 δ}^{n} (X, Y)

, then

x^{n} \notin T_{δ}^{n} (X | y^{n})

.

Lemma 6

([29], Lemma 1.2.12 and Remark). For arbitrarily fixed

δ > 0

and every distribution P on

X

and stochastic matrix

W : X \to Y

\begin{matrix} Pr {X^{n} \in T_{δ}^{n} (P)} & \geq 1 - 2 | X | e^{- 2 δ^{2} n}, \end{matrix}

(35)

\begin{matrix} Pr {Y^{n} \in T_{δ}^{n} (W | x^{n}) | X^{n} = x^{n}} & \geq 1 - 2 | X | \cdot | Y | e^{- 2 δ^{2} n} \\ for every x^{n} \in X^{n} . \end{matrix}

(36)

3.4. Proof of Converse Part

In this part, we shall prove

C_{E} (P_{X_{K}}) \subseteq S_{E} (P_{X_{K}})

.

Let a tuple

(R, D, L, E) \in C_{E} (P_{X_{K}})

be arbitrarily fixed. Then, there exists an

(n, 2^{n (R + ϵ)}, D + ϵ, L + ϵ, E + ϵ)

code that satisfies (14)–(17). Let Q be a uniform random variable over

{1, 2, \dots, n}

and let

p_{i} (x_{E, i}, x_{E^{c}, i}, {\hat{x}}_{R, i})

be the conditional distribution given

Q = i

. Evaluating the inequalities for R, we obtain

\begin{matrix} R + ϵ & \overset{(a)}{\geq} \frac{1}{n} \log M_{n} \\ \overset{(b)}{\geq} \frac{1}{n} H (J_{n}) \\ \geq \frac{1}{n} I (J_{n}; X_{E}^{n}) \\ \overset{(c)}{=} \frac{1}{n} {H (X_{E}^{n}) - H (X_{E}^{n} | J_{n}, {\hat{X}}_{R}^{n})} \\ \overset{(d)}{=} \frac{1}{n} \sum_{i = 1}^{n} H (X_{E, i}) - \frac{1}{n} \sum_{i = 1}^{n} H (X_{E, i} | X_{E}^{i - 1}, J_{n}, {\hat{X}}_{R}^{n}) \\ \overset{(e)}{\geq} \frac{1}{n} \sum_{i = 1}^{n} H (X_{E, i}) - \frac{1}{n} \sum_{i = 1}^{n} H (X_{E, i} | {\hat{X}}_{R, i}) \\ \overset{(f)}{=} \sum_{i = 1}^{n} Pr {Q = i} H (X_{E, i} | Q = i) \\ - \sum_{i = 1}^{n} Pr {Q = i} H (X_{E, i} | {\hat{X}}_{R, i}, Q = i) \\ = H (X_{E, Q} | Q) - H (X_{E, Q} | {\hat{X}}_{R, Q}, Q) \\ \overset{(g)}{=} H (X_{E}) - H (X_{E, Q} | {\hat{X}}_{R, Q}, Q) \\ \overset{(h)}{\geq} H (X_{E}) - H (X_{E} | {\hat{X}}_{R}) \\ = I (X_{E}; {\hat{X}}_{R}), \end{matrix}

(37)

(a): follows from (14),
(b): follows because $H (J_{n}) \leq \log | J_{n} | = \log M_{n}$ ,
(c): is due to the fact that ${\hat{X}}_{R}^{n} = g (J_{n})$ ,
(d): follows because each $X_{K, i}$ is independent and ${\hat{X}}_{R}^{n}$ is a function of $J_{n}$ ,
(e): follows because conditioning reduces entropy,
(f): is due to the definition of Q,
(g): follows because $X_{E} ⊥ Q$ , and
(h): follows because conditioning reduces entropy, where $(X_{E}, {\hat{X}}_{R}) \sim \sum_{i = 1}^{n} Pr {Q = i} p_{i} (x_{E, i}, {\hat{x}}_{R, i}) = p (x_{E}, {\hat{x}}_{R})$ .

Similarly, evaluating D, L, and E, respectively, we obtain

\begin{matrix} D + ϵ & \overset{(i)}{\geq} E [\frac{1}{n} \sum_{i = 1}^{n} d (X_{R, i}, {\hat{X}}_{R, i})] \\ = \frac{1}{n} \sum_{i = 1}^{n} E [d (X_{R, i}, {\hat{X}}_{R, i})] \\ \overset{(j)}{=} E_{Q} [E [d (X_{R, i}, {\hat{X}}_{R, i}) | Q]] \\ \overset{(k)}{=} E [d (X_{R}, {\hat{X}}_{R})], \end{matrix}

(38)

\begin{matrix} L + ϵ & \overset{(l)}{\geq} \frac{1}{n} I (X_{H}^{n}; J_{n}) \\ = \frac{1}{n} H (X_{H}^{n}) - \frac{1}{n} H (X_{H}^{n} | J_{n}) \\ \overset{(m)}{=} H (X_{H}) - \frac{1}{n} \sum_{i = 1}^{n} H (X_{H, i} | X_{H}^{i - 1}, J_{n}) \\ \overset{(n)}{=} H (X_{H}) - \frac{1}{n} \sum_{i = 1}^{n} H (X_{H, i} | X_{H}^{i - 1}, J_{n}, {\hat{X}}_{R, i}) \\ \overset{(o)}{\geq} H (X_{H}) - \frac{1}{n} \sum_{i = 1}^{n} H (X_{H, i} | {\hat{X}}_{R, i}) \\ \overset{(p)}{=} H (X_{H}) - \sum_{i = 1}^{n} Pr {Q = i} H (X_{H, i} | {\hat{X}}_{R, i}, Q = i) \\ = H (X_{H}) - H (X_{H, Q} | {\hat{X}}_{R, Q}, Q) \\ \overset{(q)}{\geq} H (X_{H}) - H (X_{H} | {\hat{X}}_{R}) \\ = I (X_{H}; {\hat{X}}_{R}), \end{matrix}

(39)

\begin{matrix} E + ϵ & \geq \frac{1}{n} I (X_{H}^{n}; X_{E}^{n}) \\ \overset{(r)}{=} \frac{1}{n} \sum_{i = 1}^{n} I (X_{H, i}; X_{E}^{n} | X_{H}^{i - 1}) \\ \overset{(s)}{=} \frac{1}{n} \sum_{i = 1}^{n} I (X_{H, i}; X_{E, i}) \\ \overset{(t)}{=} I (X_{H}; X_{E}), \end{matrix}

(40)

where

(i): is due to (15),
(j): is derived from the definition of Q,
(k): follows because $(X_{R}, {\hat{X}}_{R}) \sim \sum_{i = 1}^{n} Pr {Q = i} p_{i} (x_{R, i}, {\hat{x}}_{R, i}) = p (x_{R}, {\hat{x}}_{R})$ ,
(l): is due to (16),
(m): follows because $i . i . d . P_{X_{K}^{n}}$ ,
(n): follows because ${\hat{X}}_{R}^{n} = g (J_{n})$ ,
(o): follows from the fact that conditioning reduces entropy,
(p): is derived from the definition of Q, and
(q): follows because conditioning reduces entropy, where $(X_{H}, {\hat{X}}_{R}) \sim \sum_{i = 1}^{n} Pr {Q = i} p_{i} (x_{H, i}, {\hat{x}}_{R, i}) = p (x_{H}, {\hat{x}}_{R})$ ,
(r): is due to chain rule for mutual information,
(s), (t): follow because $i . i . d . P_{X_{K}^{n}}$ .

It is readily shown that the Markov chain

X_{E^{c}}

–

X_{E}

–

{\hat{X}}_{R}

holds (cf. Appendix A). We complete the proof of the converse part.

3.5. Proof of Direct Part

In this part, we provide a sketch of the proof of

S_{E} (P_{X_{K}}) \subseteq C_{E} (P_{X_{K}})

.

Under an arbitrarily fixed distribution

P_{X_{E}, X_{E^{c}}} \cdot P_{{\hat{X}}_{R} | X_{E}}

, any tuple

(R, D, L, E) \in S_{E} (P_{X_{K}})

is chosen such that

\begin{matrix} R & > I (X_{E}; {\hat{X}}_{R}), \end{matrix}

(41)

\begin{matrix} D & > E [d (X_{R}, {\hat{X}}_{R})], \end{matrix}

(42)

\begin{matrix} L & > I (X_{H}; {\hat{X}}_{R}), \end{matrix}

(43)

\begin{matrix} E & > I (X_{H}; X_{E}) . \end{matrix}

(44)

From (42) and (43), we can choose a sufficiently small

ϵ > 0

such that

\begin{matrix} D & > E [d (X_{R}, {\hat{X}}_{R})] + ϵ, \end{matrix}

(45)

\begin{matrix} L & > I (X_{H}; {\hat{X}}_{R}) + ϵ . \end{matrix}

(46)

In addition, with this

ϵ

, some constant

0 < τ < \frac{1}{2}

is fixed such that

\begin{matrix} τ (\log | X_{H} | + 5) + 4 τ \log \frac{| X_{H} | \cdot 2^{R}}{2 τ} < ϵ . \end{matrix}

(47)

We can also choose positive numbers

δ (≔ δ (n))

such that

\begin{matrix} (δ (n) + δ_{1} (n)) | X_{R} | \cdot | {\hat{X}}_{R} | D_{max} + τ < ϵ, \end{matrix}

(48)

\begin{matrix} 2 δ^{2} (n) \leq R - I (X_{E}; {\hat{X}}_{R}) - \frac{1}{n} - τ, \end{matrix}

(49)

\begin{matrix} δ (n) \to 0, \end{matrix}

(50)

\begin{matrix} \sqrt{n} \cdot δ (n) \to \infty \end{matrix}

(51)

as

n \to \infty

, where

δ_{1} ≔ (| X_{E} | - | X_{R} |) \cdot δ

and

D_{max} ≔ max_{a \in X_{R}, b \in {\hat{X}}_{R}} d (a, b)

. Let

δ (n) = \frac{c}{\sqrt{n}} \log n

where c is a constant, and obviously (50) and (51) are satisfied.

Generation of codebook: Randomly generate

{\hat{x}}_{R}^{n} (j)

from the strongly typical sequences

T_{δ}^{n} ({\hat{X}}_{R})

for

j = 1, 2, \dots, M_{n} ≔ 2^{n R}

. Reveal the codebook

C = {{\hat{x}}_{R}^{n} (1), \dots, {\hat{x}}_{R}^{n} (M_{n})}

to the encoder and decoder.

Encoding: If a sequence

x_{E}^{n} \in X_{E}^{n}

satisfies

x_{K}^{n} = (x_{E}^{n}, x_{E^{c}}^{n})

with some

x_{E^{c}}^{n} \in X_{E^{c}}^{n}

, we write

x_{E}^{n} ≺ x_{K}^{n}

. Given

x_{K}^{n}

, the encoder finds j such that

x_{E}^{n} \in T_{δ}^{n} (X_{E} | {\hat{x}}_{R} (j))

and sets

f_{n} (x_{E}^{n}) = j

where

T_{δ}^{n} (X_{E} | {\hat{x}}_{R} (j))

is the conditional strongly typical sequences. If there exist multiple such j,

f_{n} (x_{E}^{n})

is set as the minimum one. If there are no such j, then

f_{n} (x_{E}^{n}) = M_{n}

.

Decoding: When j is observed, the decoder sets the reproduced sequence as

{\hat{X}}_{R}^{n} = {\hat{x}}_{R}^{n} (j)

.

Evaluation: We define

A (j)

,

B (j)

, and

\tilde{A} (j)

as

\begin{matrix} A (j) ≔ & {x_{E}^{n} : f_{n} (x_{E}^{n}) = j}, \end{matrix}

(52)

\begin{matrix} B (j) ≔ & {x_{K}^{n} : x_{E}^{n} ≺ x_{K}^{n}, f_{n} (x_{E}^{n}) = j}, \end{matrix}

(53)

\begin{matrix} \tilde{A} (j) ≔ & \{\begin{matrix} {x_{K}^{n} : x_{E}^{n} ≺ x_{K}^{n}, f_{n} (x_{E}^{n}) = j, x_{K}^{n} \in T_{2 δ}^{n} (X_{K} | {\hat{x}}_{R}^{n} (j))} \\ (j = 1, 2, \dots, M_{n} - 1) \\ {x_{K}^{n} : x_{K}^{n} \in X_{K}^{n} ∖ ⋃_{j = 1}^{M_{n} - 1} \tilde{A} (j)} (j = M_{n}) . \end{matrix} \end{matrix}

(54)

It is easily verified that

A (j)

for

j = 1, 2, \dots, M_{n}

(also,

B (j)

and

\tilde{A} (j)

) is disjoint. From the definitions of

J_{n}

,

A (j)

, and

B (j)

,

\begin{matrix} Pr {J_{n} = j} = Pr {X_{E}^{n} \in A (j)} = Pr {X_{K}^{n} \in B (j)} \\ for j = 1, 2, \dots, M_{n} . \end{matrix}

(55)

For sufficiently large n, we can prove (cf. Appendix B)

\begin{matrix} | Pr {X_{K}^{n} \in B (j)} - Pr {X_{K}^{n} \in \tilde{A} (j)} | \leq 2 | X_{K} | \cdot | {\hat{X}}_{R} | e^{- 2 δ^{2} n} \\ for j = 1, 2, \dots, M_{n} - 1 . \end{matrix}

(56)

For sufficiently large n, we can show that there exists a code

(f_{n}, g_{n})

such that (cf. Appendix C)

\begin{matrix} r_{n} \leq R, \end{matrix}

(57)

\begin{matrix} u_{n} \leq E [d (X_{R}, {\hat{X}}_{R})] + (δ + δ_{1}) | X_{R} | \cdot | {\hat{X}}_{R} | D_{max} + τ, \end{matrix}

(58)

\begin{matrix} e_{n} \leq I (X_{H}; X_{E}), \end{matrix}

(59)

\begin{matrix} Pr \{X_{E}^{n} \notin ⋃_{j = 1}^{M_{n} - 1} A (j)\} \leq (2 | X_{E} | + 1) e^{- 2 δ^{2} n}, \end{matrix}

(60)

\begin{matrix} Pr \{X_{K}^{n} \notin ⋃_{j = 1}^{M_{n} - 1} \tilde{A} (j)\} \leq τ, \end{matrix}

(61)

\begin{matrix} | \tilde{A} (j) | \geq 2^{n {H (X_{K} | {\hat{X}}_{R}) - τ}} . \end{matrix}

(62)

For this code

(f_{n}, g_{n})

, we evaluate the privacy leakage against the decoder as

\begin{matrix} l_{n} & ≔ \frac{1}{n} I (X_{H}^{n}; J_{n}) \\ = \frac{1}{n} H (X_{H}^{n}) - \frac{1}{n} H (X_{H}^{n} | J_{n}) \\ \overset{(a)}{=} H (X_{H}) - \frac{1}{n} \sum_{j = 1}^{M_{n}} H (X_{H}^{n} | X_{K}^{n} \in B (j)) Pr {X_{K}^{n} \in B (j)} \\ \overset{(b)}{\leq} H (X_{H}) - \frac{1}{n} \sum_{j = 1}^{M_{n}} H (X_{H}^{n} | X_{K}^{n} \in \tilde{A} (j)) Pr {X_{K}^{n} \in \tilde{A} (j)} \\ + 4 τ \log \frac{| X_{H} | \cdot 2^{R}}{2 τ} \end{matrix}

(63)

\begin{matrix} \overset{(c)}{\leq} H (X_{H}) - \frac{1}{n} \sum_{j = 1}^{M_{n} - 1} H (X_{H}^{n} | X_{K}^{n} \in \tilde{A} (j)) Pr {X_{K}^{n} \in \tilde{A} (j)} \\ + 4 τ \log \frac{| X_{H} | \cdot 2^{R}}{2 τ} \\ = H (X_{H}) - \frac{1}{n} \sum_{j = 1}^{M_{n} - 1} [- \sum_{\begin{matrix} x_{H}^{n} \end{matrix}} Pr {X_{H}^{n} = x_{H}^{n} | X_{K}^{n} \in \tilde{A} (j)} \cdot \\ \log Pr {X_{H}^{n} = x_{H}^{n} | X_{K}^{n} \in \tilde{A} (j)}] \cdot Pr {X_{K}^{n} \in \tilde{A} (j)} + 4 τ \log \frac{| X_{H} | \cdot 2^{R}}{2 τ}, \end{matrix}

(64)

where

(a): follows because of $i . i . d . P_{X_{K}^{n}}$ ,
(b): is due to the inequality proved in Appendix D,
(c): follows by removing the term for $j = M_{n}$ .

Here, for any

x_{H}^{n}

satisfying

x_{K}^{n} = (x_{R}^{n}, x_{H}^{n}) \in \tilde{A} (j)

with some

x_{R}^{n}

, we can show that

\begin{matrix} Pr {X_{H}^{n} = x_{H}^{n} | X_{K}^{n} \in \tilde{A} (j)} \\ = \frac{Pr {X_{K}^{n} \in \tilde{A} (j) | X_{H}^{n} = x_{H}^{n}} Pr {X_{H}^{n} = x_{H}^{n}}}{Pr {X_{K}^{n} \in \tilde{A} (j)}} \\ = \frac{\sum_{x_{R}^{n} : (x_{R}^{n}, x_{H}^{n}) \in \tilde{A} (j)} Pr {X_{R}^{n} = x_{R}^{n}, X_{H}^{n} = x_{H}^{n} | X_{H}^{n} = x_{H}^{n}}}{\sum_{({\tilde{x}}_{R}^{n}, {\tilde{x}}_{H}^{n}) \in \tilde{A} (j)} Pr {X_{R}^{n} = {\tilde{x}}_{R}^{n}, X_{H}^{n} = {\tilde{x}}_{H}^{n}}} \cdot Pr {X_{H}^{n} = x_{H}^{n}} \\ \overset{(d)}{=} \frac{\sum_{x_{R}^{n} : (x_{R}^{n}, x_{H}^{n}) \in \tilde{A} (j)} Pr {X_{R}^{n} = x_{R}^{n} | X_{H}^{n} = x_{H}^{n}}}{\sum_{({\tilde{x}}_{R}^{n}, {\tilde{x}}_{H}^{n}) \in \tilde{A} (j)} Pr {X_{R}^{n} = {\tilde{x}}_{R}^{n}, X_{H}^{n} = {\tilde{x}}_{H}^{n}}} \cdot Pr {X_{H}^{n} = x_{H}^{n}} \\ \overset{(e)}{\leq} \frac{\sum_{x_{R}^{n} \in T_{δ_{3}}^{n} (X_{R} | x_{H}^{n}, {\hat{x}}_{R}^{n} (j))} Pr {X_{R}^{n} = x_{R}^{n} | X_{H}^{n} = x_{H}^{n}}}{\sum_{({\tilde{x}}_{R}^{n}, {\tilde{x}}_{H}^{n}) \in \tilde{A} (j)} Pr {X_{R}^{n} = {\tilde{x}}_{R}^{n}, X_{H}^{n} = {\tilde{x}}_{H}^{n}}} \cdot Pr {X_{H}^{n} = x_{H}^{n}} \end{matrix}

(65)

\begin{matrix} \overset{(f)}{\leq} \frac{2^{n {H (X_{R} | X_{H}, {\hat{X}}_{R}) + τ}} \cdot 2^{- n {H (X_{R} | X_{H}) - τ}}}{2^{n {H (X_{K} | {\hat{X}}_{R}) - τ}} \cdot 2^{- n {H (X_{K}) + τ}}} \cdot 2^{- n {H (X_{H}) - τ}} \\ = 2^{- n {H (X_{H} | {\hat{X}}_{R}) - 5 τ}}, \end{matrix}

(66)

where

(d): follows from the fact that

$\begin{matrix} Pr {X_{R}^{n} = x_{R}^{n}, X_{H}^{n} = x_{H}^{n} | X_{H}^{n} = x_{H}^{n}} = Pr {X_{R}^{n} = x_{R}^{n} | X_{H}^{n} = x_{H}^{n}}, \end{matrix}$
(e): is due to the inequality proved in Appendix E, and
(f): follows because of the number of strongly typical sequences.

Therefore, from Equations (61), (64) and (66) we can obtain

\begin{matrix} l_{n} & \leq H (X_{H}) - \frac{1}{n} \sum_{j = 1}^{M_{n} - 1} [n \sum_{\begin{matrix} x_{H}^{n} \end{matrix}} Pr {X_{H}^{n} = x_{H}^{n} | X_{K}^{n} \in \tilde{A} (j)} \cdot \\ {H (X_{H} | {\hat{X}}_{R}) - 5 τ}] \cdot Pr {X_{K}^{n} \in \tilde{A} (j)} + 4 τ \log \frac{| X_{H} | \cdot 2^{R}}{2 τ} \\ = H (X_{H}) - Pr \{X_{K}^{n} \in (⋃_{j = 1}^{M_{n} - 1} \tilde{A} (j))\} \cdot {H (X_{H} | {\hat{X}}_{R}) - 5 τ} + 4 τ \log \frac{| X_{H} | \cdot 2^{R}}{2 τ} \\ \leq H (X_{H}) - (1 - τ) {H (X_{H} | {\hat{X}}_{R}) - 5 τ} + 4 τ \log \frac{| X_{H} | \cdot 2^{R}}{2 τ} \\ \leq I (X_{H}; {\hat{X}}_{R}) + τ (\log | X_{H} | + 5) + 4 τ \log \frac{| X_{H} | \cdot 2^{R}}{2 τ} . \end{matrix}

(67)

Since constants

ϵ

,

τ

, and

δ

are fixed to satisfy (45)–(48), from (44), (57)–(59) and (67), we obtain

\begin{matrix} r_{n} & \leq R, \end{matrix}

(68)

\begin{matrix} u_{n} & \leq E [d (X_{R}, {\hat{X}}_{R})] + ϵ < D, \end{matrix}

(69)

\begin{matrix} l_{n} & < I (X_{H}; {\hat{X}}_{R}) + ϵ < L, \end{matrix}

(70)

\begin{matrix} e_{n} & \leq I (X_{H}; X_{E}) < E . \end{matrix}

(71)

Therefore, for the fixed distribution

P_{X_{E}, X_{E^{c}}} \cdot P_{{\hat{X}}_{R} | X_{E}}

any tuple

\begin{matrix} (R, D, L, E) \in {(R, D, L, E) : R & > I (X_{E}; {\hat{X}}_{R}), \\ D & > E [d (X_{R}, {\hat{X}}_{R})], \\ L & > I (X_{H}; {\hat{X}}_{R}), \\ E & > I (X_{H}; X_{E})} ≕ S_{E}^{*} (P_{X_{K}}) \end{matrix}

(72)

is achievable. Consequently,

S_{E}^{*} (P_{X_{K}}) \subseteq C_{E} (P_{X_{K}})

. Taking the closure for the left-hand side (l.h.s.), we obtain

C l (S_{E}^{*} (P_{X_{K}})) \subseteq C_{E} (P_{X_{K}})

because

C_{E} (P_{X_{K}})

is a closed set. We conclude that

S_{E} (P_{X_{K}}) = ⋃_{p} C l (S_{E}^{*} (P_{X_{K}})) \subseteq C_{E} (P_{X_{K}})

because the distribution

P_{X_{K}} = P_{X_{E}, X_{E^{c}}} \cdot P_{{\hat{X}}_{R} | X_{E}}

is fixed arbitrarily. We complete the proof of the direct part.

4. First-Order Rate Analysis with Excess-Distortion Probability

4.1. Performance Measures

Hereafter, let the pair of the encoder and decoder

(f_{n}, g_{n})

be fixed.

For a given

M_{n}

, the coding rate is defined as

\begin{matrix} r_{n} ≔ \frac{1}{n} \log M_{n} . \end{matrix}

(73)

Let

d : X_{R} \times {\hat{X}}_{R} \to [0, \infty)

be a distortion function between

x_{R} \in X_{R}

and

{\hat{x}}_{R} \in {\hat{X}}_{R}

. The distortion between sequences

x_{R}^{n} \in X_{R}^{n}

and

{\hat{x}}_{R}^{n} \in {\hat{X}}_{R}^{n}

is defined as

\begin{matrix} d (x_{R}^{n}, {\hat{x}}_{R}^{n}) ≔ & \sum_{i = 1}^{n} d (x_{R, i}, {\hat{x}}_{R, i}) . \end{matrix}

(74)

Then, the measure of utility is defined as

\begin{matrix} u_{n} ≔ & Pr \{\frac{1}{n} d (X_{R}^{n}, {\hat{X}}_{R}^{n}) > D\} . \end{matrix}

(75)

This measurement is called excess-distortion probability for

D \geq 0

.

In this system, the privacy of the hidden source sequence

X_{H}^{n}

should be protected when the codeword

J_{n}

is observed by decoder

g_{n}

. The measure of privacy for the decoder is defined as

\begin{matrix} l_{n} ≔ \frac{1}{n} I (X_{H}^{n}; J_{n}), \end{matrix}

(76)

where

I (X_{H}^{n}; J_{n})

is the mutual information between

X_{H}^{n}

and

J_{n}

.

The privacy of the hidden source sequence

X_{H}^{n}

should be protected when the encoded information

X_{E}

is observed by encoder

f_{n}

. The measurement of privacy for the encoder is defined as

\begin{matrix} e_{n} ≔ \frac{1}{n} I (X_{H}^{n}; X_{E}^{n}), \end{matrix}

(77)

where

I (X_{H}^{n}; X_{E}^{n})

is the mutual information between

X_{H}^{n}

and

X_{E}^{n}

.

4.2. Achievable Region and Theorem

We define the achievable region for the first-order rate analysis with the excess-distortion probability and state the obtained results.

Definition 10.

A tuple

(R, D, L, E)

is said to be

ϵ

-achievable (with respect to the excess-distortion probability) if, for any given

ϵ > 0

, there exists a sequence of codes

(f_{n}, g_{n})

satisfying

\begin{matrix} r_{n} & \leq R + ϵ, \end{matrix}

(78)

\begin{matrix} u_{n} & \leq ϵ, \end{matrix}

(79)

\begin{matrix} l_{n} & \leq L + ϵ, \end{matrix}

(80)

\begin{matrix} e_{n} & \leq E + ϵ \end{matrix}

(81)

for all sufficiently large n.

The technical meanings of each constraint in Definition 10 can be interpreted as follows: Equation (78) evaluates how much the source sequence is compressed, so this rate should be decreased. Equation (79) is the constraint corresponding to the excess-distortion probability being less than

ϵ

, so this condition should also be decreased. Equation (80) constrains the amount of leaked private information to the decoder. Since private information should be kept secret for the receiver, this quantity should be decreased as well. Equation (81) constrains the amount of leaked private information to the encoder. For the same reason as (80), this quantity should also be decreased.

Definition 11.

The closure of the set of ϵ-achievable tuples

(R, D, L, E)

is referred to as the

ϵ

-achievable region and is denoted by

L_{E} (ϵ | P_{X_{K}})

and define

\begin{matrix} L_{E} (P_{X_{K}}) ≔ ⋂_{0 < ϵ < 1} L_{E} (ϵ | P_{X_{K}}) . \end{matrix}

(82)

We establish the following theorem. For the proof of this theorem, please refer to Section 4.3 and Section 4.4.

Theorem 2.

For any

E

such that

R \subseteq E \subseteq K

, the achievable region of the coding system is given by

\begin{matrix} L_{E} (P_{X_{K}}) = S_{E} (P_{X_{K}}) . \end{matrix}

(83)

Remark 4.

From Theorems 1 and 2, we find that the achievable region in which utility is measured by the expected distortion is equal to the one in which utility is measured by the excess-distortion probability.

Because in Section 6 we discuss the achievable region among coding rate, utility, and privacy, a characterization of the achievable region is derived by projecting the characterization in Theorem 2 onto the

R - D - L

hyperplane.

Definition 12.

For any

E

such that

R \subseteq E \subseteq K

, we define

\begin{matrix} L_{E}^{R D L} (ϵ | P_{X_{K}}) ≔ {(R, D, L) : (R, D, L, E) \in L_{E} (ϵ | P_{X_{K}})} \end{matrix}

(84)

and

\begin{matrix} L_{E}^{R D L} (P_{X_{K}}) ≔ ⋂_{0 < ϵ < 1} L_{E}^{R D L} (ϵ | P_{X_{K}}) . \end{matrix}

(85)

Definition 13.

For any

E

such that

R \subseteq E \subseteq K

, we define

\begin{matrix} S_{E}^{R D L} (P_{X_{K}}) = {(R, D, L) : & R \geq I (X_{E}; {\hat{X}}_{R}), \\ D \geq E [d (X_{R}, {\hat{X}}_{R})], \\ L \geq I (X_{H}; {\hat{X}}_{R}) \\ for some P_{X_{E}, X_{E^{c}}} \cdot P_{{\hat{X}}_{R} | X_{E}}} . \end{matrix}

(86)

Corollary 2.

For any

E

such that

R \subseteq E \subseteq K

, the region

L_{E}^{R D L} (P_{X_{K}})

is given by

\begin{matrix} L_{E}^{R D L} (P_{X_{K}}) = S_{E}^{R D L} (P_{X_{K}}) . \end{matrix}

(87)

Examples of numerical calculation of this result are shown in Section 6.1.

Since we focus on the achievable region between utility and privacy in the next section, a characterization of the achievable region is derived by further projecting the result of Theorem 2 onto the

D - L

plane.

Definition 14.

For any

E

such that

R \subseteq E \subseteq K

, we define

\begin{matrix} L_{E}^{D L} (ϵ | P_{X_{K}}) ≔ {(D, L) : (R, D, L, E) \in L_{E} (ϵ | P_{X_{K}})} \end{matrix}

(88)

and

\begin{matrix} L_{E}^{D L} (P_{X_{K}}) ≔ ⋂_{0 < ϵ < 1} L_{E}^{D L} (ϵ | P_{X_{K}}) . \end{matrix}

(89)

Definition 15.

For any

E

such that

R \subseteq E \subseteq K

, we define

\begin{matrix} S_{E}^{D L} (P_{X_{K}}) = {(D, L) : & D \geq E [d (X_{R}, {\hat{X}}_{R})], \\ L \geq I (X_{H}; {\hat{X}}_{R}) \\ for some P_{X_{E}, X_{E^{c}}} \cdot P_{{\hat{X}}_{R} | X_{E}}} . \end{matrix}

(90)

Corollary 3.

For any

E

such that

R \subseteq E \subseteq K

, the region

L_{E}^{D L} (P_{X_{K}})

is given by

\begin{matrix} L_{E}^{D L} (P_{X_{K}}) = S_{E}^{D L} (P_{X_{K}}) . \end{matrix}

(91)

4.3. Proof of Converse Part

From Section 3.4 (proof of the converse part), we have

\begin{matrix} C_{E} (P_{X_{K}}) \subseteq S_{E} (P_{X_{K}}) . \end{matrix}

(92)

Let a tuple

(R, D, L, E) \in L_{E} (P_{X_{K}})

be arbitrarily fixed and

ϵ > 0

and

ϵ^{'} > 0

be given. From the argument of the method of types, the sequences

x_{R}^{n}

are divided into two categories: distortion-typical or non-distortion-typical with some

{\hat{x}}_{R}^{n}

. The sequences of the former categories satisfy

\frac{1}{n} d (x_{R}^{n}, {\hat{x}}_{R}^{n}) < D + ϵ

and the sequences of the latter one satisfy

\frac{1}{n} d (x_{R}^{n}, {\hat{x}}_{R}^{n}) < d_{max}

where

d_{max} ≔ max_{x_{R} \in X_{R}, {\hat{x}}_{R} \in {\hat{X}}_{R}} d (x_{R}, {\hat{x}}_{R})

. Then, the expected distortion is bounded from above as

\begin{matrix} E [\frac{1}{n} d (X_{R}^{n}, {\hat{X}}_{R}^{n})] & \leq D + ϵ + Pr \{\frac{1}{n} d (x_{R}^{n}, {\hat{x}}_{R}^{n}) > D + ϵ\} \cdot d_{max} \\ \leq D + ϵ + Pr \{\frac{1}{n} d (x_{R}^{n}, {\hat{x}}_{R}^{n}) > D\} \cdot d_{max} \\ \overset{(a)}{\leq} D + ϵ + ϵ^{'} d_{max}, \end{matrix}

(93)

where (a) follows from (79) of

ϵ

-achievable in which utility is measured by the excess-distortion probability. Since

ϵ + ϵ^{'} d_{max}

can be arbitrarily small with proper choices of

ϵ

and

ϵ^{'}

, (15) can be derived. This means

\begin{matrix} L_{E} (P_{X_{K}}) \subseteq C_{E} (P_{X_{K}}) . \end{matrix}

(94)

From both inclusion relations,

\begin{matrix} L_{E} (P_{X_{K}}) \subseteq C_{E} (P_{X_{K}}) \subseteq S_{E} (P_{X_{K}}) \end{matrix}

(95)

is evidently satisfied.

4.4. Proof of the Direct Part

In this part, we provide a sketch of the proof of

S_{E} (P_{X_{K}}) \subseteq L_{E} (ϵ | P_{X_{K}})

.

Under an arbitrarily fixed distribution

P_{X_{E}, X_{E^{c}}} \cdot P_{{\hat{X}}_{R} | X_{E}}

, any tuple

(R, D, L, E) \in S_{E} (P_{X_{K}})

is chosen such that

\begin{matrix} R & > I (X_{E}; {\hat{X}}_{R}), \end{matrix}

(96)

\begin{matrix} D & > E [d (X_{R}, {\hat{X}}_{R})], \end{matrix}

(97)

\begin{matrix} L & > I (X_{H}; {\hat{X}}_{R}), \end{matrix}

(98)

\begin{matrix} E & > I (X_{H}; X_{E}) . \end{matrix}

(99)

From (97) and (98) , we can choose a sufficiently small

ϵ > 0

such that

\begin{matrix} D & > E [d (X_{R}, {\hat{X}}_{R})] + ϵ, \end{matrix}

(100)

\begin{matrix} L & > I (X_{H}; {\hat{X}}_{R}) + ϵ . \end{matrix}

(101)

In addition, with this

ϵ

, some constant

0 < τ < \frac{1}{2}

is fixed such that

\begin{matrix} τ (\log | X_{H} | + 5) + 4 τ \log \frac{| X_{H} | \cdot 2^{R}}{2 τ} < ϵ . \end{matrix}

(102)

We can also choose positive numbers

δ (≔ δ (n))

such that

\begin{matrix} 2 δ^{2} (n) \leq R - I (X_{E}; {\hat{X}}_{R}) - \frac{1}{n} - τ, \end{matrix}

(103)

\begin{matrix} δ (n) \to 0, \end{matrix}

(104)

\begin{matrix} \sqrt{n} \cdot δ (n) \to \infty \end{matrix}

(105)

as

n \to \infty

. Let

δ (n) = \frac{c}{\sqrt{n}} \log n

where c is a constant, and obviously (104) and (105) are satisfied.

Generation of codebook: Randomly generate

{\hat{x}}_{R}^{n} (j)

from the strongly typical sequences

T_{δ}^{n} ({\hat{X}}_{R})

for

j = 1, 2, \dots, M_{n} ≔ 2^{n R}

. Reveal the codebook

C = {{\hat{x}}_{R}^{n} (1), \dots, {\hat{x}}_{R}^{n} (M_{n})}

to the encoder and decoder.

Encoding: If a sequence

x_{E}^{n} \in X_{E}^{n}

satisfies

x_{K}^{n} = (x_{E}^{n}, x_{E^{c}}^{n})

with some

x_{E^{c}}^{n} \in X_{E^{c}}^{n}

, we write

x_{E}^{n} ≺ x_{K}^{n}

. Given

x_{K}^{n}

, the encoder finds j such that

x_{E}^{n} \in T_{δ}^{n} (X_{E} | {\hat{x}}_{R}^{n} (j))

and sets

f_{n} (x_{E}^{n}) = j

where

T_{δ}^{n} (X_{E} | {\hat{x}}_{R}^{n} (j))

is the conditional strongly typical sequences. If there exist multiple such j,

f_{n} (x_{E}^{n})

is set as the minimum one. If there are no such j, then

f_{n} (x_{E}^{n}) = M_{n}

.

Decoding: When j is observed, the decoder sets the reproduced sequence as

{\hat{X}}_{R}^{n} = {\hat{x}}_{R}^{n} (j)

.

Evaluation: We define

A (j)

,

B (j)

, and

\tilde{A} (j)

as

\begin{matrix} A (j) ≔ & {x_{E}^{n} : f_{n} (x_{E}^{n}) = j}, \end{matrix}

(106)

\begin{matrix} B (j) ≔ & {x_{K}^{n} : x_{E}^{n} ≺ x_{K}^{n}, f_{n} (x_{E}^{n}) = j}, \end{matrix}

(107)

\begin{matrix} \tilde{A} (j) ≔ & \{\begin{matrix} {x_{K}^{n} : x_{E}^{n} ≺ x_{K}^{n}, f_{n} (x_{E}^{n}) = j, x_{K}^{n} \in T_{2 δ}^{n} (X_{K} | {\hat{x}}_{R}^{n} (j))} \\ (j = 1, 2, \dots, M_{n} - 1) \\ {x_{K}^{n} : x_{K}^{n} \in X_{K}^{n} ∖ ⋃_{j = 1}^{M_{n} - 1} \tilde{A} (j)} (j = M_{n}) . \end{matrix} \end{matrix}

(108)

It is easily verified that

A (j)

for

j = 1, 2, \dots, M_{n}

(and also

B (j)

and

\tilde{A} (j)

) is disjoint. From the definitions of

J_{n}

,

A (j)

, and

B (j)

,

\begin{matrix} Pr {J_{n} = j} = Pr {X_{E}^{n} \in A (j)} = Pr {X_{K}^{n} \in B (j)} \\ for j = 1, 2, \dots, M_{n} . \end{matrix}

(109)

For sufficiently large n, we can prove (cf. Appendix B)

\begin{matrix} | Pr {X_{K}^{n} \in B (j)} - Pr {X_{K}^{n} \in \tilde{A} (j)} | \leq 2 | X_{K} | \cdot | {\hat{X}}_{R} | e^{- 2 δ^{2} n} \\ for j = 1, 2, \dots, M_{n} - 1 . \end{matrix}

(110)

For sufficiently large n, we can show that there exists a code

(f_{n}, g_{n})

such that (cf. Appendix F)

\begin{matrix} r_{n} \leq R, \end{matrix}

(111)

\begin{matrix} Pr \{X_{E}^{n} \notin ⋃_{j = 1}^{M_{n} - 1} A (j)\} \leq (2 | X_{E} | + 1) e^{- 2 δ^{2} n}, \end{matrix}

(112)

\begin{matrix} u_{n} \leq (2 | X_{E} | + 1) e^{- 2 δ^{2} n}, \end{matrix}

(113)

\begin{matrix} e_{n} \leq I (X_{H}; X_{E}), \end{matrix}

(114)

\begin{matrix} Pr \{X_{K}^{n} \notin ⋃_{j = 1}^{M_{n} - 1} \tilde{A} (j)\} \leq τ, \end{matrix}

(115)

\begin{matrix} | \tilde{A} (j) | \geq 2^{n {H (X_{K} | {\hat{X}}_{R}) - τ}} . \end{matrix}

(116)

For this code

(f_{n}, g_{n})

, we evaluate the privacy leakage against the decoder as

\begin{matrix} l_{n} & ≔ \frac{1}{n} I (X_{H}^{n}; J_{n}) \\ = \frac{1}{n} H (X_{H}^{n}) - \frac{1}{n} H (X_{H}^{n} | J_{n}) \\ \overset{(a)}{=} H (X_{H}) - \frac{1}{n} H (X_{H}^{n} | J_{n}) \\ = H (X_{H}) - \frac{1}{n} \sum_{j = 1}^{M_{n}} H (X_{H}^{n} | X_{K}^{n} \in B (j)) Pr {X_{K}^{n} \in B (j)} \\ \overset{(b)}{\leq} H (X_{H}) - \frac{1}{n} \sum_{j = 1}^{M_{n}} H (X_{H}^{n} | X_{K}^{n} \in \tilde{A} (j)) Pr {X_{K}^{n} \in \tilde{A} (j)} \end{matrix}

(117)

\begin{matrix} + 4 τ \log \frac{| X_{H} | \cdot 2^{R}}{2 τ} \\ \overset{(c)}{\leq} H (X_{H}) - \frac{1}{n} \sum_{j = 1}^{M_{n} - 1} H (X_{H}^{n} | X_{K}^{n} \in \tilde{A} (j)) Pr {X_{K}^{n} \in \tilde{A} (j)} \\ + 4 τ \log \frac{| X_{H} | \cdot 2^{R}}{2 τ} \\ = H (X_{H}) - \frac{1}{n} \sum_{j = 1}^{M_{n} - 1} [- \sum_{\begin{matrix} x_{H}^{n} \end{matrix}} Pr {X_{H}^{n} = x_{H}^{n} | X_{K}^{n} \in \tilde{A} (j)} \cdot \log Pr {X_{H}^{n} = x_{H}^{n} | X_{K}^{n} \in \tilde{A} (j)}] \cdot \end{matrix}

(118)

\begin{matrix} Pr {X_{K}^{n} \in \tilde{A} (j)} + 4 τ \log \frac{| X_{H} | \cdot 2^{R}}{2 τ}, \end{matrix}

(119)

where

(a): follows because of $i . i . d . P_{X_{K}^{n}}$ ,
(b): is due to the inequality proved in Appendix D, and
(c): follows by removing the term for $j = M_{n}$ .

Here, for any

x_{H}^{n}

satisfying

x_{K}^{n} = (x_{R}^{n}, x_{H}^{n}) \in \tilde{A} (j)

with some

x_{R}^{n}

, we can show that

\begin{matrix} Pr {X_{H}^{n} = x_{H}^{n} | X_{K}^{n} \in \tilde{A} (j)} \\ = \frac{Pr {X_{K}^{n} \in \tilde{A} (j) | X_{H}^{n} = x_{H}^{n}} Pr {X_{H}^{n} = x_{H}^{n}}}{Pr {X_{K}^{n} \in \tilde{A} (j)}} \\ = \frac{\sum_{x_{R}^{n} : (x_{R}^{n}, x_{H}^{n}) \in \tilde{A} (j)} Pr {X_{R}^{n} = x_{R}^{n}, X_{H}^{n} = x_{H}^{n} | X_{H}^{n} = x_{H}^{n}}}{\sum_{({\tilde{x}}_{R}^{n}, {\tilde{x}}_{H}^{n}) \in \tilde{A} (j)} Pr {X_{R}^{n} = {\tilde{x}}_{R}^{n}, X_{H}^{n} = {\tilde{x}}_{H}^{n}}} \cdot Pr {X_{H}^{n} = x_{H}^{n}} \\ \overset{(d)}{=} \frac{\sum_{x_{R}^{n} : (x_{R}^{n}, x_{H}^{n}) \in \tilde{A} (j)} Pr {X_{R}^{n} = x_{R}^{n} | X_{H}^{n} = x_{H}^{n}}}{\sum_{({\tilde{x}}_{R}^{n}, {\tilde{x}}_{H}^{n}) \in \tilde{A} (j)} Pr {X_{R}^{n} = {\tilde{x}}_{R}^{n}, X_{H}^{n} = {\tilde{x}}_{H}^{n}}} \cdot Pr {X_{H}^{n} = x_{H}^{n}} \\ \overset{(e)}{\leq} \frac{\sum_{x_{R}^{n} \in T_{δ_{3}}^{n} (X_{R} | x_{H}^{n}, {\hat{x}}_{R}^{n} (j))} Pr {X_{R}^{n} = x_{R}^{n} | X_{H}^{n} = x_{H}^{n}}}{\sum_{({\tilde{x}}_{R}^{n}, {\tilde{x}}_{H}^{n}) \in \tilde{A} (j)} Pr {X_{R}^{n} = {\tilde{x}}_{R}^{n}, X_{H}^{n} = {\tilde{x}}_{H}^{n}}} \cdot Pr {X_{H}^{n} = x_{H}^{n}} \end{matrix}

(120)

\begin{matrix} \overset{(f)}{\leq} \frac{2^{n {H (X_{R} | X_{H}, {\hat{X}}_{R}) + τ}} \cdot 2^{- n {H (X_{R} | X_{H}) - τ}}}{2^{n {H (X_{K} | {\hat{X}}_{R}) - τ}} \cdot 2^{- n {H (X_{K}) + τ}}} \cdot 2^{- n {H (X_{H}) - τ}} \\ = 2^{- n {H (X_{H} | {\hat{X}}_{R}) - 5 τ}}, \end{matrix}

(121)

where

(d): follows from the fact that

$\begin{matrix} Pr {X_{R}^{n} = x_{R}^{n}, X_{H}^{n} = x_{H}^{n} | X_{H}^{n} = x_{H}^{n}} = Pr {X_{R}^{n} = x_{R}^{n} | X_{H}^{n} = x_{H}^{n}}, \end{matrix}$
(e): is due to the inequality proved in Appendix E, and
(f): follows because of the number of strongly typical sequences.

Therefore, from Equations (115), (119), and (121), we can obtain

\begin{matrix} l_{n} & \leq H (X_{H}) - \frac{1}{n} \sum_{j = 1}^{M_{n} - 1} [n \sum_{\begin{matrix} x_{H}^{n} \end{matrix}} Pr {X_{H}^{n} = x_{H}^{n} | X_{K}^{n} \in \tilde{A} (j)} \cdot \\ {H (X_{H} | {\hat{X}}_{R}) - 5 τ}] \cdot Pr {X_{K}^{n} \in \tilde{A} (j)} + 4 τ \log \frac{| X_{H} | \cdot 2^{R}}{2 τ} \\ = H (X_{H}) - Pr \{X_{K}^{n} \in (⋃_{j = 1}^{M_{n} - 1} \tilde{A} (j))\} \cdot {H (X_{H} | {\hat{X}}_{R}) - 5 τ} + 4 τ \log \frac{| X_{H} | \cdot 2^{R}}{2 τ} \\ \leq H (X_{H}) - (1 - τ) {H (X_{H} | {\hat{X}}_{R}) - 5 τ} + 4 τ \log \frac{| X_{H} | \cdot 2^{R}}{2 τ} \\ \leq I (X_{H}; {\hat{X}}_{R}) + τ {H (X_{H} | {\hat{X}}_{R}) + 5} + 4 τ \log \frac{| X_{H} | \cdot 2^{R}}{2 τ} . \end{matrix}

(122)

Since constants

ϵ

,

τ

, and

δ

are fixed to satisfy (100)–(102), from (111), (113), and (122), we obtain

\begin{matrix} r_{n} \leq R, \end{matrix}

(123)

\begin{matrix} u_{n} \leq ϵ, \end{matrix}

(124)

\begin{matrix} l_{n} < I (X_{H}; {\hat{X}}_{R}) + ϵ < L, \end{matrix}

(125)

\begin{matrix} e_{n} \leq I (X_{H}; X_{E}) < E . \end{matrix}

(126)

Therefore, for the fixed distribution

P_{X_{E}, X_{E^{c}}} \cdot P_{{\hat{X}}_{R} | X_{E}}

, any tuple

\begin{matrix} (R, D, L, E) \in {(R, D, L, E) : R & > I (X_{E}; {\hat{X}}_{R}), \\ D & > E [d (X_{R}, {\hat{X}}_{R})], \\ L & > I (X_{H}; {\hat{X}}_{R}), \\ E & > I (X_{H}; X_{E})} ≕ S_{E}^{*} (P_{X_{K}}) \end{matrix}

(127)

is achievable. Consequently,

S_{E}^{*} (P_{X_{K}}) \subseteq L_{E} (ϵ | P_{X_{K}})

. Taking the closure for the l.h.s., we obtain

C l (S_{E}^{*} (P_{X_{K}})) \subseteq L_{E} (ϵ | P_{X_{K}})

because

L_{E} (ϵ | P_{X_{K}})

is a closed set. We conclude that

S_{E} (P_{X_{K}}) = ⋃_{p} C l (S_{E}^{*} (P_{X_{K}})) \subseteq L_{E} (ϵ | P_{X_{K}})

because the distribution

P_{X_{K}, {\hat{X}}_{R}} = P_{X_{E}, X_{E^{c}}} \cdot P_{{\hat{X}}_{R} | X_{E}}

is fixed arbitrarily. We complete the proof of the direct part.

5. Strong Converse Theorem for Utility–Privacy Trade-Offs

5.1. Another Expression of the Achievable Region

In Section 5.1, we clarify that the achievable region

L_{E}^{D L} (P_{X_{K}})

defined in (89) coincides with the region expressed with a tangent plane.

Definition 16.

For any

E

such that

R \subseteq E \subseteq K

, the region

T_{E}^{μ} (P_{X_{K}})

is defined as

\begin{matrix} T_{E}^{μ} (P_{X_{K}}) ≔ min {I (X_{H}; {\hat{X}}_{R}) + μ E [d (X_{R}, {\hat{X}}_{R})] for some P_{{\hat{X}}_{R} | X_{E}} \cdot P_{X_{E^{c}} X_{E}}}, \\ w h e r e \\ T_{E}^{D L} (P_{X_{K}}) ≔ ⋂_{μ \geq 0} {(L, D) : L + μ D \geq T_{E}^{μ} (P_{X_{K}})} . \end{matrix}

Theorem 3.

For any

E

such that

R \subseteq E \subseteq K

, the region

S_{E}^{D L} (P_{X_{K}})

defined in (90) is given by

\begin{matrix} S_{E}^{D L} (P_{X_{K}}) = T_{E}^{D L} (P_{X_{K}}), \end{matrix}

(128)

and the achievable region

L_{E}^{D L} (P_{X_{K}})

, which is the projection region of the achievable region

L_{E} (P_{X_{K}})

onto the

D - L

plane, is given by

\begin{matrix} L_{E}^{D L} (P_{X_{K}}) = T_{E}^{D L} (P_{X_{K}}) . \end{matrix}

(129)

Proof.

Figure 3 illustrates the proof image using a graph. Let a constance

μ \geq 0

be fixed arbitrarily. Like in Figure 3, there exists a boundary point

(D_{μ}, L_{μ})

of

S_{E}^{D L}

tangent to the line with slope

- μ

. The intercept of this tangent line is

L_{μ} + μ D_{μ}

.

The minimum

I (X_{H}; {\hat{X}}_{R}) + μ E [d (X_{R}, {\hat{X}}_{R})]

characterized by some distribution

P_{{\hat{X}}_{R} | X_{E}}

coincides with

L_{μ} + μ D_{μ}

. Therefore,

\begin{matrix} L_{μ} + μ D_{μ} = min & {I (X_{H}; {\hat{X}}_{R}) + μ E [d (X_{R}, {\hat{X}}_{R})] for some P_{{\hat{X}}_{R} | X_{E}} \cdot P_{X_{E^{c}} X_{E}}} . \end{matrix}

(130)

From (130), we obtain

\begin{matrix} {(L, D) : L + μ D \geq L_{μ} + μ D_{μ}} = {(L, D) : L + μ D \geq min {I (X_{H}; {\hat{X}}_{R}) \\ + μ E [d (X_{R}, {\hat{X}}_{R})] for some P_{{\hat{X}}_{R} | X_{E}} \cdot P_{X_{E^{c}} X_{E}}}} . \end{matrix}

(131)

Taking the intersection by

μ \geq 0

on the both sides of (131),

\begin{matrix} ⋂_{μ \geq 0} {(L, D) : L + μ D \geq L_{μ} + μ D_{μ}} = ⋂_{μ \geq 0} {(L, D) : L + μ D \geq T_{E}^{μ} (P_{X_{K}})} . \end{matrix}

(132)

The l.h.s. of (131) shows the upper-right region in the first quadrant drawn by the tangent line with a slope

- μ

for

S_{E}^{D L} (P_{X_{K}})

. Since the l.h.s. of (132) is the intersection of the l.h.s. of (131), the l.h.s. of (132) represents

S_{E}^{D L} (P_{X_{K}})

. From Definition 16, the right-hand side (r.h.s.) of (132) is

T_{E}^{D L} (P_{X_{K}})

. As a result, (128) holds. Since

L_{E}^{D L} (P_{X_{K}}) = S_{E}^{D L} (P_{X_{K}})

from Corollary 3, likewise, (129) holds. □

5.2. Proof Preliminaries

In Section 5.2, we derive two fundamental properties of the minimization about two values and the inequalities about entropy and divergence to prove the strong converse theorem. In Proposition 1, we change the objective function

T_{E}^{μ} (P_{X_{K}})

of the region expressed with the tangent plane introduced in Section 5.1 onto the region expressed with divergence.

Proposition 1.

Let

μ \leq 0

be fixed arbitrarily. For any

E

such that

R \subseteq E \subseteq K

,

\begin{matrix} T_{E}^{μ} (P_{X_{K}}) & = sup_{α > 0} T_{E}^{μ, α} (P_{X_{K}}), \end{matrix}

(133)

where

\begin{matrix} T_{E}^{μ, α} (P_{X_{K}}) ≔ & min_{P_{{\tilde{X}}_{E^{c}} {\tilde{X}}_{E} {\tilde{\hat{X}}}_{R}}} [I ({\tilde{X}}_{H}; {\tilde{\hat{X}}}_{R}) + μ E [d ({\tilde{X}}_{R}, {\tilde{\hat{X}}}_{R})] \\ + α D (P_{{\tilde{X}}_{E^{c}} {\tilde{X}}_{E} {\tilde{\hat{X}}}_{R}} ‖ Q_{X_{E^{c}} X_{E} {\tilde{\hat{X}}}_{R}}) + D (P_{{\tilde{X}}_{E^{c}} {\tilde{X}}_{E}} ‖ P_{X_{E^{c}} X_{E}})] \\ = min_{P_{{\tilde{X}}_{E^{c}} {\tilde{X}}_{E} {\tilde{\hat{X}}}_{R}}} [I ({\tilde{X}}_{H}; {\tilde{\hat{X}}}_{R}) + μ E [d ({\tilde{X}}_{R}, {\tilde{\hat{X}}}_{R})]] \\ + (α + 1) D (P_{{\tilde{X}}_{E^{c}} {\tilde{X}}_{E}} ‖ P_{X_{E^{c}} X_{E}}) + α I ({\tilde{X}}_{E^{c}}; {\tilde{\hat{X}}}_{R} | {\tilde{X}}_{E})], \end{matrix}

(134)

and

Q_{X_{E^{c}} X_{E} {\tilde{\hat{X}}}_{R}}

is the distribution induced from each

P_{{\tilde{X}}_{E^{c}} {\tilde{X}}_{E} {\tilde{\hat{X}}}_{R}}

.

Proof.

First, it is clear that

T_{E}^{μ} (P_{X_{K}}) \geq T_{E}^{μ, α} (P_{X_{K}})

for all

α > 0

. To prove

T_{E}^{μ} (P_{X_{K}}) \leq T_{E}^{μ, α} (P_{X_{K}})

for some

α > 0

, for

α > 0

, let

P_{{\tilde{X}}_{E^{c}} {\tilde{X}}_{E} {\tilde{\hat{X}}}_{R}}^{α}

be the distribution that minimizes the r.h.s. of (134) and

Q_{X_{E^{c}} X_{E} {\tilde{\hat{X}}}_{R}}^{α} = P_{{\tilde{\hat{X}}}_{R} | {\tilde{X}}_{E}} P_{X_{E^{c}} X_{E}}

be the estimated distribution. Since

G (P_{{\tilde{X}}_{E^{c}} {\tilde{X}}_{E} {\tilde{\hat{X}}}_{R}}^{α}) I ≔ ({\tilde{X}}_{H}; {\tilde{\hat{X}}}_{R}) + E [d ({\tilde{X}}_{R}, {\tilde{\hat{X}}}_{R})]

is non-negative and is bounded above, by setting

a = \log | X_{H} | + D_{max}

, it must hold that

\begin{matrix} α D (P_{{\tilde{X}}_{E^{c}} {\tilde{X}}_{E} {\tilde{\hat{X}}}_{R}}^{α} ‖ Q_{X_{E^{c}} X_{E} {\tilde{\hat{X}}}_{R}}^{α}) \leq a \end{matrix}

and thus

\begin{matrix} D (P_{{\tilde{X}}_{E^{c}} {\tilde{X}}_{E} {\tilde{\hat{X}}}_{R}}^{α} ‖ Q_{X_{E^{c}} X_{E} {\tilde{\hat{X}}}_{R}}^{α}) \leq (a / α) . \end{matrix}

Notice that any set of probability distributions on a finite alphabet forms a compact set. Because

G (P_{{\tilde{X}}_{E^{c}} {\tilde{X}}_{E} {\tilde{\hat{X}}}_{R}}^{α})

is a continuous function over a compact set, it is also uniformly continuous. Then, there exists a function

Δ (t)

satisfying

Δ (t) \to 0

as

t \to 0

such that

\begin{matrix} T_{E}^{μ, α} (P_{X_{K}}) & \geq G (P_{{\tilde{X}}_{E^{c}} {\tilde{X}}_{E} {\tilde{\hat{X}}}_{R}}^{α}) \\ \geq G (Q_{X_{E^{c}} X_{E} {\tilde{\hat{X}}}_{R}}^{α}) - Δ (a / α) \\ \geq T_{E}^{μ} (P_{X_{K}}) - Δ (a / α) . \end{matrix}

Consequently, we obtain the desired inequality

T_{E}^{μ} (P_{X_{K}}) \leq lim_{α \to \infty} T_{E}^{μ, α} (P_{X_{K}})

by taking

α \to \infty

. □

In the following proposition, we show the inequalities satisfied between

i . i . d .

source

P_{X_{E^{c}}^{n} X_{E}^{n}}

and arbitrary source

P_{{\tilde{X}}_{E^{c}}^{n} {\tilde{X}}_{E}^{n}}

.

Proposition 2.

For

i . i . d .

source

P_{X_{E^{c}}^{n} X_{E}^{n}}

, which has the common distribution

P_{X_{E^{c}} X_{E}}

and arbitrary distribution

P_{{\tilde{X}}_{E^{c}}^{n} {\tilde{X}}_{E}^{n}}

, it holds that

\begin{matrix} H ({\tilde{X}}_{E^{c}}^{n} | {\tilde{X}}_{E}^{n}) + D (P_{{\tilde{X}}_{E^{c}}^{n} {\tilde{X}}_{E}^{n}} ‖ P_{X_{E^{c}}^{n} X_{E}^{n}}) & \geq n [H ({\tilde{X}}_{E^{c}, J} | {\tilde{X}}_{E, J}) + D (P_{{\tilde{X}}_{E^{c}, J} {\tilde{X}}_{E, J}} ‖ P_{X_{E^{c}} X_{E}})], \end{matrix}

(135)

\begin{matrix} H ({\tilde{X}}_{H}^{n}) + D (P_{{\tilde{X}}_{H}^{n} {\tilde{X}}_{R}^{n}} ‖ P_{X_{H}^{n} X_{R}^{n}}) & \geq n [H ({\tilde{X}}_{H, J}) + D (P_{{\tilde{X}}_{H, J} {\tilde{X}}_{R, J}} ‖ P_{X_{H} X_{R}})], \end{matrix}

(136)

where

J \sim unif (1, \dots, n)

is the uniformly random variable over the set

{1, 2, \dots, n}

for time-sharing and is assumed to be independent of all the other random variables involved.

Proof.

The l.h.s. of (135) can be represented as

\begin{matrix} H ({\tilde{X}}_{E^{c}}^{n} | {\tilde{X}}_{E}^{n}) + D (P_{{\tilde{X}}_{E^{c}}^{n} | {\tilde{X}}_{E}^{n}} ‖ P_{X_{E^{c}}^{n} | X_{E}^{n}} | P_{{\tilde{X}}_{E}^{n}}) + D (P_{{\tilde{X}}_{E}^{n}} ‖ P_{X_{E}^{n}}) . \end{matrix}

The sum of the first and second terms satisfies the following equation:

\begin{matrix} H ({\tilde{X}}_{E^{c}}^{n} | {\tilde{X}}_{E}^{n}) + D (P_{{\tilde{X}}_{E^{c}}^{n} | {\tilde{X}}_{E}^{n}} ‖ P_{X_{E^{c}}^{n} | X_{E}^{n}} | P_{{\tilde{X}}_{E}^{n}}) \\ = \sum_{x_{E^{c}}^{n}, x_{E}^{n}} P_{{\tilde{X}}_{E^{c}}^{n} {\tilde{X}}_{E}^{n}} (x_{E^{c}}^{n}, x_{E}^{n}) \\ \cdot \{\log \frac{1}{P_{{\tilde{X}}_{E^{c}}^{n} | {\tilde{X}}_{E}^{n}} (x_{E^{c}}^{n} | x_{E}^{n})} + \log \frac{P_{{\tilde{X}}_{E^{c}}^{n} | {\tilde{X}}_{E}^{n}} (x_{E^{c}}^{n} | x_{E}^{n})}{P_{X_{E^{c}}^{n} | X_{E}^{n}} (x_{E^{c}}^{n} | x_{E}^{n})}\} \\ = \sum_{x_{E^{c}}^{n}, x_{E}^{n}} P_{{\tilde{X}}_{E^{c}}^{n} {\tilde{X}}_{E}^{n}} (x_{E^{c}}^{n}, x_{E}^{n}) \log \frac{1}{P_{X_{E^{c}}^{n} | X_{E}^{n}} (x_{E^{c}}^{n} | x_{E}^{n})} \\ \overset{(a)}{=} \sum_{x_{E^{c}}^{n}, x_{E}^{n}} P_{{\tilde{X}}_{E^{c}}^{n} {\tilde{X}}_{E}^{n}} (x_{E^{c}}^{n}, x_{E}^{n}) \cdot \{\sum_{j = 1}^{n} \log \frac{1}{P_{X_{E^{c}} | X_{E}} (x_{E^{c}, j} | x_{E, j})}\} \\ \overset{(b)}{=} n \sum_{x_{E^{c}}, x_{E}} P_{{\tilde{X}}_{E^{c}, J} {\tilde{X}}_{E, J}} (x_{E^{c}}, x_{E}) \log \frac{1}{P_{X_{E^{c}} | X_{E}} (x_{E^{c}} | x_{E})} \\ = n \sum_{x_{E^{c}}, x_{E}} P_{{\tilde{X}}_{E^{c}, J} {\tilde{X}}_{E, J}} (x_{E^{c}}, x_{E}) \\ \cdot \{\log \frac{1}{P_{{\tilde{X}}_{E^{c}, J} | {\tilde{X}}_{E, J}} (x_{E^{c}} | x_{E})} + \log \frac{P_{{\tilde{X}}_{E^{c}, J} | {\tilde{X}}_{E, J}} (x_{E^{c}} | x_{E})}{P_{X_{E^{c}} | X_{E}} (x_{E^{c}} | x_{E})}\} \\ = n {H ({\tilde{X}}_{E^{c}, J} | {\tilde{X}}_{E, J}) + D (P_{{\tilde{X}}_{E^{c}, J} | {\tilde{X}}_{E, J}} ‖ P_{X_{E^{c}} | X_{E}} | P_{{\tilde{X}}_{E, J}})}, \end{matrix}

(137)

where

(a): follows from the memoryless property of $i . i . d .$ source $P_{X_{E^{c}}^{n} X_{E}^{n}}$ ;
(b): holds because $\frac{1}{n} \sum_{j = 1}^{n} P_{{\tilde{X}}_{E^{c}, j} {\tilde{X}}_{E, j}} (x_{E^{c}}, x_{E}) = P_{{\tilde{X}}_{E^{c}, J} {\tilde{X}}_{E, J}} (x_{E^{c}}, x_{E})$ .

The third term can be bounded from below as

\begin{matrix} D (P_{{\tilde{X}}_{E}^{n}} ‖ P_{X_{E}^{n}}) & = \sum_{j = 1}^{n} D (P_{{\tilde{X}}_{E}, j | {\tilde{X}}_{E}^{j - 1}} ‖ P_{X_{E}} | P_{{\tilde{X}}_{E}^{j - 1}}) \\ \overset{(c)}{\geq} \sum_{j = 1}^{n} D (P_{{\tilde{X}}_{E}, j} ‖ P_{X_{E}}) \\ \overset{(d)}{\geq} n D (P_{{\tilde{X}}_{E}, J} ‖ P_{X_{E}}), \end{matrix}

(138)

where

(c): follows from the data processing inequality and
(d): holds because of Jensen’s inequality.

From (137) and (138), (135) can be derived.

Likewise, the l.h.s. of (136) can be represented as

\begin{matrix} H ({\tilde{X}}_{H}^{n}) + D (P_{{\tilde{X}}_{H}^{n}} ‖ P_{X_{H}^{n}}) + D (P_{{\tilde{X}}_{R}^{n} | {\tilde{X}}_{H}^{n}} ‖ P_{X_{R}^{n} | X_{H}^{n}} | P_{{\tilde{X}}_{H}^{n}}), \end{matrix}

The sum of the first and second terms satisfies

\begin{matrix} H ({\tilde{X}}_{H}^{n}) + D (P_{{\tilde{X}}_{H}^{n}} ‖ P_{X_{H}^{n}}) \\ = \sum_{x_{H}^{n}} P_{{\tilde{X}}_{H}^{n}} (x_{H}^{n}) \{\log \frac{1}{P_{{\tilde{X}}_{H}^{n}} (x_{H}^{n})} + \log \frac{P_{{\tilde{X}}_{H}^{n}} (x_{H}^{n})}{P_{X_{H}^{n}} (x_{H}^{n})}\} \\ = \sum_{x_{H}^{n}} P_{{\tilde{X}}_{H}^{n}} (x_{H}^{n}) \log \frac{1}{P_{X_{H}^{n}} (x_{H}^{n})} \\ = \sum_{x_{H}^{n}} P_{{\tilde{X}}_{H}^{n}} (x_{H}^{n}) \cdot \{\sum_{j = 1}^{n} \log \frac{1}{P_{X_{H}} (x_{H, j})}\} \\ \overset{(e)}{=} n \sum_{x_{H}} P_{{\tilde{X}}_{H}, J} (x_{H}) \log \frac{1}{P_{X_{H}} (x_{H})} \\ = n \sum_{x_{H}} P_{{\tilde{X}}_{H}, J} (x_{H}) \{\log \frac{1}{P_{{\tilde{X}}_{H}, J} (x_{H})} + \log \frac{P_{{\tilde{X}}_{H}, J} (x_{H})}{P_{X_{H}} (x_{H})}\} \\ = n {H ({\tilde{X}}_{H, J}) + D (P_{{\tilde{X}}_{H, J}} ‖ P_{X_{H}})}, \end{matrix}

(139)

where

(e): holds because $\frac{1}{n} \sum_{j = 1}^{n} P_{{\tilde{X}}_{H, j}} (x_{H}) = P_{{\tilde{X}}_{H, J}} (x_{H})$ .

For the third term, it holds that

\begin{matrix} D (P_{{\tilde{X}}_{R}^{n} | {\tilde{X}}_{H}^{n}} ‖ P_{X_{R}^{n} | X_{H}^{n}} | P_{{\tilde{X}}_{H}^{n}}) & = \sum_{j = 1}^{n} D (P_{{\tilde{X}}_{R, j} | {\tilde{X}}_{H}^{n} {\tilde{X}}_{R}^{j - 1}} ‖ P_{X_{R} | X_{H}} | P_{{\tilde{X}}_{H}^{n} {\tilde{X}}_{R}^{j - 1}}) \\ \overset{(f)}{\geq} \sum_{j = 1}^{n} D (P_{{\tilde{X}}_{R, j} | {\tilde{X}}_{H, j}} ‖ P_{X_{R} | X_{H}} | P_{{\tilde{X}}_{H, j}}) \\ \geq n D (P_{{\tilde{X}}_{R, J} | {\tilde{X}}_{H, J}} ‖ P_{X_{R} | X_{H}} | P_{{\tilde{X}}_{H}, J}), \end{matrix}

(140)

where

(f): follows from the log sum inequality.

From (139) and (140), we obtain (136). □

5.3. Strong Converse Theorem

We shall establish the strong converse theorem, which is the main result of this section. Before proving the theorem, we state the lemma of the key tool in the proof about a single-letterized

T_{E}^{μ, α} (P_{X_{K}})

and a

T_{E}^{μ, α} (P_{X_{K}}^{n})

, which are introduced in Proposition 1.

Lemma 7.

For any

E

such that

R \subseteq E \subseteq K

, all

n \in N

,

μ \geq 0

and

α > 0

, it holds that

\begin{matrix} T_{E}^{μ, α} (P_{X_{K}}^{n}) \geq n T_{E}^{μ, α} (P_{X_{K}}) . \end{matrix}

As the main theorem of this section, we show the strong converse theorem for the utility–privacy trade-offs.

Theorem 4.

Strong converse theorem: For any

E

such that

R \subseteq E \subseteq K

and all

0 < ϵ < 1

, it holds that

\begin{matrix} L_{E}^{D L} (ϵ | P_{X_{K}}) = L_{E}^{D L} (P_{X_{K}}) . \end{matrix}

Remark 5.

Theorem 4 suggests that regardless of the value of ϵ, the region

L_{E}^{D L} (ϵ | P_{X_{K}})

is equal to

L_{E}^{D L} (P_{X_{K}})

.

5.4. Proof of Lemma 7

Lemma 7 indicates that the function

T_{E}^{μ, α} (P_{X_{K}}^{n})

, whose argument

P_{X_{K}}^{n}

is a probability distribution over

X_{K}^{n}

, can be lower-bounded by the n-fold of a single-letterized function

T_{E}^{μ, α} (P_{X_{K}})

. Before describing the detailed proof, we state the outline of the proof: (i) We first express the function

T_{E}^{μ, α} (P_{X_{K}}^{n})

as the maximum of the difference of two functions denoted by

G_{1}

and

G_{2}

as in (142). (ii) Then, we show that the first function

G_{1}

can be lower-bounded by the n-fold of its single-letterized function as in (143), while the second function

G_{2}

can be upper-bounded by the n-fold of its single-letterized function as in (147). This outline of the proof is similar to the Proof of Theorem 4, 16 with a slight modification of the function

G_{2}

.

For a given distribution

P_{{\tilde{X}}_{E^{c}}^{n} {\tilde{X}}_{E}^{n} {\tilde{\hat{X}}}_{R}^{n}}

, let functions

G_{1} (P_{{\tilde{X}}_{E^{c}}^{n} {\tilde{X}}_{E}^{n}})

and

G_{2} (P_{{\tilde{X}}_{E^{c}}^{n} {\tilde{X}}_{E}^{n} {\tilde{\hat{X}}}_{R}^{n}})

be defined as

\begin{matrix} G_{1} (P_{{\tilde{X}}_{E^{c}}^{n} {\tilde{X}}_{E}^{n}}) ≔ & H ({\tilde{X}}_{H}^{n}) + α H ({\tilde{X}}_{E^{c}}^{n} | {\tilde{X}}_{E}^{n}) + (α + 1) D (P_{{\tilde{X}}_{E^{c}}^{n} {\tilde{X}}_{E}^{n}} ‖ P_{X_{E^{c}} X_{E}}^{n}), \\ G_{2} (P_{{\tilde{X}}_{E^{c}}^{n} {\tilde{X}}_{E}^{n} {\tilde{\hat{X}}}_{R}^{n}}) ≔ & H ({\tilde{X}}_{H}^{n} | {\tilde{\hat{X}}}_{R}^{n}) - μ E [d ({\tilde{X}}_{R}^{n}, {\tilde{\hat{X}}}_{R}^{n})] + α H ({\tilde{X}}_{E}^{n} | {\tilde{X}}_{E}^{n}, {\tilde{\hat{X}}}_{R}^{n}) . \end{matrix}

(141)

Using these functions, and in view of (134),

T_{E}^{μ, α} (P_{X_{E^{c}} X_{E}}^{n})

can be written as

\begin{matrix} T_{E}^{μ, α} (P_{X_{E^{c}} X_{E}}^{n}) = min_{P_{{\tilde{X}}_{E^{c}}^{n} {\tilde{X}}_{E}^{n} {\tilde{\hat{X}}}_{R}^{n}}} [G_{1} (P_{{\tilde{X}}_{E^{c}}^{n} {\tilde{X}}_{E}^{n}}) - G_{2} (P_{{\tilde{X}}_{E^{c}}^{n} {\tilde{X}}_{E}^{n} {\tilde{\hat{X}}}_{R}^{n}})] . \end{matrix}

(142)

For fixed

P_{{\tilde{X}}_{E^{c}}^{n} {\tilde{X}}_{E}^{n} {\tilde{\hat{X}}}_{R}^{n}}

, from Proposition 2, it holds that

\begin{matrix} G_{1} (P_{{\tilde{X}}_{E^{c}}^{n} {\tilde{X}}_{E}^{n}}) \geq n G_{1} (P_{{\tilde{X}}_{E^{c}, J} {\tilde{X}}_{E, J}}) . \end{matrix}

(143)

Next, we consider the function

G_{2} (P_{{\tilde{X}}_{E^{c}}^{n} {\tilde{X}}_{E}^{n} {\tilde{\hat{X}}}_{R}^{n}})

. For the first term on the r.h.s. of (141), it holds that

\begin{matrix} H ({\tilde{X}}_{H}^{n} | {\tilde{\hat{X}}}_{R}^{n}) & = \sum_{j = 1}^{n} H ({\tilde{X}}_{H, j} | {\tilde{X}}_{H}^{j - 1}, {\tilde{\hat{X}}}_{R}^{n}) \\ \leq \sum_{j = 1}^{n} H ({\tilde{X}}_{H, j} | {\tilde{\hat{X}}}_{R, j}) \\ = n \cdot \frac{1}{n} \sum_{j = 1}^{n} H ({\tilde{X}}_{H, j} | {\tilde{\hat{X}}}_{R, j}) \\ = n H ({\tilde{X}}_{H, J} | {\tilde{\hat{X}}}_{R, J}, J) \\ \leq n H ({\tilde{X}}_{H, J} | {\tilde{\hat{X}}}_{R, J}) . \end{matrix}

(144)

The second term of (141) can be expressed as follows:

\begin{matrix} E [d ({\tilde{X}}_{R}^{n}, {\tilde{\hat{X}}}_{R}^{n})] & = \sum_{x_{R}^{n}, {\hat{x}}_{R}^{n}} P_{{\tilde{X}}_{R}^{n} {\tilde{\hat{X}}}_{R}^{n}} (x_{R}^{n}, {\hat{x}}_{R}^{n}) \cdot \{\sum_{j = 1}^{n} d (x_{R, j}, {\hat{x}}_{R, j})\} \\ = \sum_{j = 1}^{n} \sum_{x_{R}, {\hat{x}}_{R}} P_{{\tilde{X}}_{R, j} {\tilde{\hat{X}}}_{R, j}} (x_{R}, {\hat{x}}_{R}) d (x_{R}, {\hat{x}}_{R}) \\ \overset{(a)}{=} n \sum_{x_{R}, {\hat{x}}_{R}} P_{{\tilde{X}}_{R, J} {\tilde{\hat{X}}}_{R, J}} (x_{R}, {\hat{x}}_{R}) d (x_{R}, {\hat{x}}_{R}) \\ = n E [d ({\tilde{X}}_{R, J}, {\tilde{\hat{X}}}_{R, J})], \end{matrix}

(145)

where

(a): follows from $\frac{1}{n} \sum_{j = 1}^{n} P_{{\tilde{X}}_{R, j} {\tilde{\hat{X}}}_{R, j}} (x_{R}, {\hat{x}}_{R}) = P_{{\tilde{X}}_{R, J} {\tilde{\hat{X}}}_{R, J}} (x_{R}, {\hat{x}}_{R})$ .

Moreover, for the third term of (141), it holds that

\begin{matrix} H ({\tilde{X}}_{E^{c}}^{n} | {\tilde{X}}_{E}^{n}, {\tilde{\hat{X}}}_{R}^{n}) & = \sum_{j = 1}^{n} H ({\tilde{X}}_{E^{c}, j} | {\tilde{X}}_{E^{c}}^{j - 1}, {\tilde{X}}_{E}^{n}, {\tilde{\hat{X}}}_{R}^{n}) \\ \leq \sum_{j = 1}^{n} H ({\tilde{X}}_{E^{c}, j} | {\tilde{X}}_{E, j}, {\tilde{\hat{X}}}_{R, j}) \\ = n \cdot \frac{1}{n} \sum_{j = 1}^{n} H ({\tilde{X}}_{E^{c}, j} | {\tilde{X}}_{E, j}, {\tilde{\hat{X}}}_{R, j}) \\ = n H ({\tilde{X}}_{E^{c}, J} | {\tilde{X}}_{E, J}, {\tilde{\hat{X}}}_{R, J}, J) \\ \leq n H ({\tilde{X}}_{E^{c}, J} | {\tilde{X}}_{E, J}, {\tilde{\hat{X}}}_{R, J}) . \end{matrix}

(146)

From (144)–(146), we obtain

\begin{matrix} G_{2} (P_{{\tilde{X}}_{E^{c}}^{n} {\tilde{X}}_{E}^{n} {\tilde{\hat{X}}}_{R}^{n}}) \leq n G_{2} (P_{{\tilde{X}}_{E^{c}, J} {\tilde{X}}_{E, J} {\tilde{\hat{X}}}_{R, J}}) . \end{matrix}

(147)

Consequently, since (143) and (147) are satisfied for an arbitrary

P_{{\tilde{X}}_{E^{c}}^{n} {\tilde{X}}_{E}^{n} {\tilde{\hat{X}}}_{R}^{n}}

, the proof is completed.

5.5. Proof of Strong Converse Theorem

For any given

ϵ > 0

, fix the rate pair

(D, L) \in L_{E}^{D L} (ϵ | P_{X_{K}})

arbitrarily. Then, by definition, there exists a code

(f_{n}, g_{n})

satisfying (79) and (80). For this code

(f_{n}, g_{n})

, a set

D

is defined as

\begin{matrix} D ≔ {(x_{E^{c}}^{n}, x_{E}^{n}) : d (x_{R}^{n}, g_{n} (f_{n} (x_{E}^{n}))) \leq n D} . \end{matrix}

We derive a distribution

P_{{\tilde{X}}_{E^{c}}^{n} {\tilde{X}}_{E}^{n}}

as

\begin{matrix} P_{{\tilde{X}}_{E^{c}}^{n} {\tilde{X}}_{E}^{n}} (x_{E^{c}}^{n}, x_{E}^{n}) ≔ \frac{P_{X_{E^{c}} X_{E}}^{n} (x_{E^{c}}^{n}, x_{E}^{n}) 1 l [(x_{E^{c}}^{n}, x_{E}^{n}) \in D]}{P_{X_{E^{c} X_{E}}}^{n} (D)} . \end{matrix}

It is obvious that the excess-distortion probability measured by

P_{{\tilde{X}}_{E^{c}}^{n} {\tilde{X}}_{E}^{n}}

is 0; that is,

{\tilde{X}}_{R}^{n}

and

{\tilde{\hat{X}}}_{R}^{n} = g_{n} (f_{n} ({\tilde{X}}_{E}^{n}))

satisfy

E [d ({\tilde{X}}_{R}^{n}, {\tilde{\hat{X}}}_{R}^{n})] \leq n D

. Thus, by imitating the proof approach of the standard weak converse theorem, it holds that

\begin{matrix} n (L + μ D) \geq I ({\tilde{X}}_{H}^{n}; {\tilde{\hat{X}}}_{R}^{n}) + μ E [d ({\tilde{X}}_{R}^{n}, {\tilde{\hat{X}}}_{R}^{n})], \end{matrix}

(148)

\begin{matrix} D (P_{{\tilde{X}}_{E^{c}}^{n} {\tilde{X}}_{E}^{n}} ‖ P_{X_{E^{c}} X_{E}}^{n}) = \log \frac{1}{P_{X_{E^{c}} X_{E}}^{n} (D)} \leq \log \frac{1}{1 - ϵ} . \end{matrix}

(149)

From (148), the following equation is obtained:

\begin{matrix} n (L + μ D) & \overset{(a)}{\geq} I ({\tilde{X}}_{H}^{n}; {\tilde{\hat{X}}}_{R}^{n}) + μ E [d ({\tilde{X}}_{R}^{n}, {\tilde{\hat{X}}}_{R}^{n})] \\ + ((α + 1) D (P_{{\tilde{X}}_{E^{c}}^{n} {\tilde{X}}_{E}^{n}} ‖ P_{X_{E^{c}} X_{E}}^{n}) + α I ({\tilde{X}}_{E^{c}}^{n}; {\tilde{\hat{X}}}_{R}^{n} | {\tilde{X}}_{E}^{n})) \\ - (α + 1) \log \frac{1}{1 - ϵ} \\ \overset{(b)}{\geq} T_{E}^{μ, α} (P_{X_{K}}^{n}) - (α + 1) \log \frac{1}{1 - ϵ}, \end{matrix}

where

(a): follows from (149) and $I ({\tilde{X}}_{E^{c}}^{n}; {\tilde{\hat{X}}}_{R}^{n} | {\tilde{X}}_{E}^{n}) = 0$ ,
(b): is due to (134).

Since

T_{E}^{μ, α} (P_{X_{K}}^{n}) \geq n T_{E}^{μ, α} (P_{X_{K}})

from Lemma 7, we have

\begin{matrix} L + μ D & \geq T_{E}^{μ, α} (P_{X_{K}}) - \frac{(α + 1)}{n} \log \frac{1}{1 - ϵ}, \end{matrix}

and therefore

\begin{matrix} sup_{α > 0} (L + μ D) & \geq sup_{α > 0} [T_{ϵ}^{μ, α} (P_{X_{K}}) - \frac{(α + 1)}{n} \log \frac{1}{1 - ϵ}] . \end{matrix}

Because

T_{E}^{μ} (P_{X_{K}}) = {sup}_{α > 0} T_{E}^{μ, α} (P_{X_{K}})

from Proposition 1, it holds that for an arbitrary

α > 0

,

\begin{matrix} L + μ D & \geq T_{E}^{μ} (P_{X_{K}}) - \frac{(α + 1)}{n} \log \frac{1}{1 - ϵ} . \end{matrix}

Hence, it holds that

\begin{matrix} L + μ D & \geq lim_{n \to \infty} (T_{E}^{μ} (P_{X_{K}}) - \frac{(α + 1)}{n} \log \frac{1}{1 - ϵ}) \\ = T_{E}^{μ} (P_{X_{K}}) for every μ \geq 0 . \end{matrix}

(150)

For the set of

(D, L)

satisfying (150), varying

μ \geq 0

arbitrarily and taking the intersection, we have

\begin{matrix} (D, L) \in ⋂_{μ \geq 0} {(D, L) : L + μ D \geq T_{E}^{μ} (P_{X_{K}})} . \end{matrix}

(151)

From Theorem 3, the r.h.s. of (151) is equal to

L_{E}^{D L} (P_{X_{K}})

. This proof is completed.

6. Discussion

6.1. Numerical Calculation of Coding Rate, Utility, and Privacy for Decoder

In this section, we show some numerical calculations of the achievable region

C_{E}^{R D L} (P_{X_{K}})

and

L_{E}^{R D L} (P_{X_{K}})

in Corollaries 1 and 2, respectively. In general, it is difficult to compute the achievable region

C_{E}^{R D L} (P_{X_{K}})

and

L_{E}^{R D L} (P_{X_{K}})

. Nevertheless, to obtain some insight, let us consider the three tractable but essential cases. In these calculations, the number of public attributes is one

(| R | = 1)

and the number of private attributes is two

(| H | = 2)

. We assume that each of the attributes is binary. Here, note again that the coding rate R acts like the rate-distortion function in rate-distortion theory (cf. (Section 10 in [27])). For fixed D and L, a smaller coding rate is better.

In the first example, we calculated the L-D graph of theoretical limits in case (i)

E = K

, case (ii)

E = R

, and case (iii)

R \subset E \subset K

(Figure 4). As a result, the achievable privacy leakage L becomes small as D becomes large if we do not impose any restrictions on the value of R. For a given D, the privacy leakage for the decoder in case (i)

E = K

is the smallest, and the one in case (ii)

E = R

is the largest in all cases. The second example calculated the R-D graph of theoretical limits in cases (i), (ii), and (iii) (Figure 5). We can see that the minimum coding rates for a given D coincide in all cases if we do not impose any restrictions on the value of L. In the third example, we calculated the optimal privacy leakage L for fixed D and the corresponding coding rates R in cases (i), (ii), and (iii) (Table 1, Table 2 and Table 3). As a result, the optimal privacy leakage in cases (i) and (iii) is smaller than the one in case (ii), whereas for the optimal privacy leakage, the achievable coding rates in cases (i) and (iii) is larger than the one in case (ii).

Next, we discuss these results. In Figure 4, in comparison with each case, we can verify that for a given D, the more private information is encoded, the smaller the achievable minimum privacy leakage is. Figure 5 suggests that if the coding rate should be minimized, it suffices to encode only the public attributes. This result is evident from Corollaries 1 and 2 because the condition on the choice of test channel

P_{{\hat{X}}_{R} | X_{E}}

in case (i) is weaker than the one in case (ii), and if an appropriate test channel is taken in case (i), it is also appropriate in case (ii). It is indicated that the achievable region in case (ii) is also the one in cases (i) and (iii). The opposite is not the case. From Table 1, Table 2 and Table 3, we can confirm the trade-off between the optimal privacy leakage L for a fixed D and the corresponding coding rate R in comparison with each case.

Summarizing the foregoing arguments, we have discussed the relationship between utility and privacy in Figure 4, the one between utility and coding rate in Figure 5, and the one between privacy and coding rate in Table 1, Table 2 and Table 3. From the discussion about Figure 5, some readers may suspect that case (i) is the best-encoded information because the achievable region in cases (ii) and (iii) is the one in case (i). This is true if we do not consider the leakage for the encoder. However, this is not true if we consider the leakage for the encoder, that is, the measurement of privacy for the encoder (see (12) or (76)). In the next section, we discuss this point in detail.

6.2. Significance of Limited Leakage for Encoder

In this section, we discuss the significance of evaluating the leakage for the encoder. Our goal of this discussion is to show that the best-encoded information may be case (iii)

R \subset E \subset K

if we take the limited leakage for the encoder into consideration.

The first issue is the amount of encoded information. Some readers may think that it is better that more encoded information is inputted into the encoder. However, there are pros and cons.

Pros:: The achievable regions $C_{E}^{R D L} (P_{X_{K}})$ and $L_{E}^{R D L} (P_{X_{K}})$ become larger.
Cons:: The leakage for the encoder increases.

From this point of view, we can come up with the idea that there exists the best-encoded information in case (iii)

R \subset E \subset K

if we impose some constraint on the leakage for the encoder. This idea is the key point of this paper.

The second issue is the significance of the limited leakage for the encoder. Figure 6 shows the Hasse diagram, which represents the inclusion relation about the index sets of attributes. The Hasse diagram is often used to represent inclusion relations, for example,

R \subset E_{2} \subset E_{1} \subset K

.

We can also regard Figure 6 as the Hasse diagram that represents the inclusion relation for the achievable regions

C_{E}^{R D L} (P_{X_{K}})

and

L_{E}^{R D L} (P_{X_{K}})

because the index sets of attributes (

R \subseteq E \subseteq K

) corresponds to the encoded information (

X_{E}

) and the encoded information corresponds to the achievable region (

C_{E}^{R D L} (P_{X_{K}})

and

L_{E}^{R D L} (P_{X_{K}})

). In addition, the diagram in Figure 6 has another property, which is that the superordinate sets have a larger amount of privacy leakage for the encoder than the subordinate sets since the index sets of attributes correspond to the privacy leakage for the encoder.

Let us consider a practical application. We assume that the data aggregator, that is, the encoder, tries to gather encoded information from some application user and hopes to develop the utility of the application while limiting the amount of leakage for

X_{H}^{n}

by

E \geq 0

, that is,

e_{n} \leq E

. More precisely, for a given E, we want to find which subsets of

K

are sufficient to characterize

\begin{matrix} C^{R D L} (P_{X_{K}} | E) ≔ ⋃_{R \subseteq E \subseteq K} \{(R, D, L) : (R, D, L, E) \in C_{E} (P_{X_{K}})\}, \\ L^{R D L} (P_{X_{K}} | E) ≔ ⋃_{R \subseteq E \subseteq K} \{(R, D, L) : (R, D, L, E) \in L_{E} (P_{X_{K}})\}, \end{matrix}

where

C_{E} (P_{X_{K}})

and

L_{E} (P_{X_{K}})

are defined in Definitions 2 and 11, respectively. The process is as follows.

Step 1:: Check the user’s requirements and impose the restriction on the privacy leakage for the encoder.

Figure 7 shows the Hasse diagram for Step 1. The blue dotted line means the border line satisfies the restriction of the privacy leakage for the encoder. Therefore, the index sets

E_{1}

and

K

are not suitable as the index sets of encoded information.

Step 2:: Check the inclusion relation between index sets.

Figure 8 shows the Hasse diagram for Step 2. From Figure 6, we can find that

\begin{matrix} R \subset E_{2}, R \subset E_{3}, R \subset E_{5}, E_{3} \subset E_{4}, E_{5} \subset E_{4} . \end{matrix}

Therefore, the index sets

R

,

E_{3}

, and

E_{5}

are not suitable as the index sets of encoded information.

Figure 9 shows the Hasse diagram obtained after Step 2. From Figure 9, the remaining index sets are

E_{2}

and

E_{4}

. Therefore, if we impose restriction on privacy leakage for the encoder, the index sets

E_{2}

or

E_{4}

form the Pareto area in this multi-objective optimization problem. In other words, there exists a system that satisfies the user’s requirements E of the maximum amount of leakage to the encoder, and the achievable regions are given by

C^{R D L} (P_{X_{K}} | E) = C_{E_{2}}^{R D L} (P_{X_{K}}) \cup C_{E_{4}}^{R D L} (P_{X_{K}})

and

L^{R D L} (P_{X_{K}} | E) = L_{E_{2}}^{R D L} (P_{X_{K}}) \cup L_{E_{4}}^{R D L} (P_{X_{K}})

.

From the discussion above, we mention that the best-encoded information is case (iii)

R \subset E \subset K

if we take the limited leakage for the encoder into account. This concept is one of the most important novelties in this paper.

If E satisfies some condition, then

C^{R D L} (P_{X_{K}} | E)

can be characterized by the expressions given by Yamamoto [1] (cf. Remark 3). More specifically, the region

C^{R D L} (P_{X_{K}} | E)

can be given by

\begin{matrix} C^{R D L} (P_{X_{K}} | E) & = S_{K}^{R D L} (P_{X_{K}}) \end{matrix}

if

E \geq H (X_{K})

and

\begin{matrix} C^{R D L} (P_{X_{K}} | E) & = S_{R}^{R D L} (P_{X_{K}}) \end{matrix}

if

H (X_{R}) \leq E < H (X_{E})

for any

R \subset E

with

R \neq E

, where the regions

S_{K}^{R D L} (P_{X_{K}})

and

S_{R}^{R D L} (P_{X_{K}})

are given in [1] (cf. Remark 3).

6.3. Discussion on Measures for Privacy Leakage

This paper adopts the mutual information as the measure of privacy leakage as in (12), (13), (76), and (77). However, some less likely data can be leaked even though the database satisfies the theoretical limit of privacy leakage. For example, let

(X, Y)

be a pair of correlated random variables whose

I (X; Y)

is very small. However, there may exist a pair of

(x_{1}, y_{1})

such that

Y = y_{1}

can imply

X = x_{1}

with high probability. To put it differently, the receiver can tell the value of X if it observes

Y = y_{1}

. The theoretical limit evaluated with mutual information cannot prevent such a scenario. To circumvent this scenario, we suggest the other measurement adopted in related studies. A promising candidate to avoid this problem is to employ Rényi information of higher orders [30], maximal leakage [15], and maximal

α

-leakage [16,17,18,21].

7. Conclusions

In this paper, we strengthened the results in [3] mainly by establishing three coding theorems in a privacy-constrained source coding problem. In Section 3 and Section 4, two theorems are made about the first-order rate analysis in which utility is measured by the expected distortion or the excess-distortion probability for case (iii),

R \subset E \subset K

. The novelty is the introduction of the measure of privacy for the encoder along with the use of the excess-distortion probability. The obtained characterization reduces to the one given in [3] derived based on the expected distortion when the leakage for the encoder is not limited, and the result shows that employing an excess-distortion probability does not change the achievable region from the one with an expected distortion. In Section 5, we establish the strong converse theorem for utility–privacy trade-offs. Although the described result is for the projected plane of utility and privacy for the decoder for simplicity, we can also incorporate the measure of privacy for the encoder. Finally, we discuss the significance of the encoded information considering limited leakage for the encoder. The argument suggests that the best-encoded information can be case (iii)

R \subset E \subset K

if some constraint is imposed on the privacy leakage for the encoder.

As future work, the second-order rate analysis for utility–privacy trade-offs is an interesting research topic [4,5,6]. Moreover, the strong converse theorem and the second-order rate analysis for the four-dimensional region of coding rate, utility, privacy for the decoder, and privacy for the encoder are more challenging tasks. It is also worth analyzing the achievable region with the other privacy measures such as Rényi information [30], maximal leakage [15], and maximal

α

-leakage [16,17,18,21]. This paper analyzed the theoretical limits of coding, but understanding how to achieve the theoretical limits remains open. The construction of good codes is also an important subject. Extensions of this paper’s scenario to coding with side information [2,25] are also of interest.

Author Contributions

N.S. contributed to the conceptualization of the research goals and aims, the visualization, the formal analysis of the results, and the review and editing. H.Y. contributed to the conceptualization of the ideas, the validation of the results, and the supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by JSPS KAKENHI Grant Numbers JP20K04462 and JP18H01438.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of the Markov Chain $X_{E^{c}}$ – $X_{E}$ – ${\hat{X}}_{R}$ in Converse Part of Theorem 1

Let

p_{i} (x_{E, i}, x_{E^{c}, i}, {\hat{x}}_{R, i})

be the conditional distribution given

Q = i

,

\begin{matrix} p_{i} (x_{E, i}, x_{E^{c}, i}, {\hat{x}}_{R, i}) & = \sum_{\begin{matrix} x_{E, k} : \\ k \neq i \end{matrix}} \sum_{\begin{matrix} x_{E^{c}, k} : \\ k \neq i \end{matrix}} \sum_{\begin{matrix} {\hat{x}}_{R, k} : \\ k \neq i \end{matrix}} p (x_{E}^{n}, x_{E^{c}}^{n}, {\hat{x}}_{R}^{n}) \\ = \sum_{\begin{matrix} x_{E, k} : \\ k \neq i \end{matrix}} \sum_{\begin{matrix} x_{E^{c}, k} : \\ k \neq i \end{matrix}} p (x_{E}^{n}, x_{E^{c}}^{n}, {\hat{x}}_{R, i}) \\ \overset{(a)}{=} \sum_{\begin{matrix} x_{E, k} : \\ k \neq i \end{matrix}} \sum_{\begin{matrix} x_{E^{c}, k} : \\ k \neq i \end{matrix}} p_{i} (x_{E}^{n}, {\hat{x}}_{R, i}) p (x_{E^{c}}^{n} | x_{E}^{n}) \\ = \sum_{\begin{matrix} x_{E, k} : \\ k \neq i \end{matrix}} p_{i} (x_{E}^{n}, {\hat{x}}_{R, i}) \sum_{\begin{matrix} x_{E^{c}, k} : \\ k \neq i \end{matrix}} p (x_{E^{c}}^{n} | x_{E}^{n}) \\ \overset{(b)}{=} \sum_{\begin{matrix} x_{E, k} : \\ k \neq i \end{matrix}} p_{i} (x_{E}^{n}, {\hat{x}}_{R, i}) \cdot \\ \sum_{\begin{matrix} x_{E^{c}, k} : \\ k \neq i \end{matrix}} (\prod_{l = 1}^{n} p (x_{E^{c}, l} | x_{E, l})) \\ = p_{i} (x_{E, i}, {\hat{x}}_{R, i}) p (x_{E^{c}, i} | x_{E, i}) \\ = p (x_{E, i}) p (x_{E^{c}, i} | x_{E, i}) p_{i} ({\hat{x}}_{R, i} | x_{E, i}) \\ = p (x_{E, i}, x_{E^{c}, i}) p_{i} ({\hat{x}}_{R, i} | x_{E, i}), \end{matrix}

(A1)

where

(a): is due to the Markov chain $X_{E^{c}}^{n}$ – $X_{E}^{n}$ – ${\hat{X}}_{R, i}$ and
(b): follows from the stationary memoryless source.

Therefore, we can obtain the Markov chain

X_{E^{c}, i}

–

X_{E, i}

–

{\hat{X}}_{R, i}

. For the marginal distribution, we can show that

\begin{matrix} p (x_{E}, x_{E^{c}}, {\hat{x}}_{R}) & \overset{(c)}{=} \frac{1}{n} \sum_{i = 1}^{n} p_{i} (x_{E}, x_{E^{c}}, {\hat{x}}_{R}) \\ \overset{(d)}{=} \frac{1}{n} \sum_{i = 1}^{n} p_{i} (x_{E}, x_{E^{c}}) p_{i} ({\hat{x}}_{R} | x_{E}) \\ \overset{(e)}{=} p (x_{E}, x_{E^{c}}) \cdot \frac{1}{n} \sum_{i = 1}^{n} p_{i} ({\hat{x}}_{R} | x_{E}) \\ \overset{(f)}{=} p (x_{E}, x_{E^{c}}) p ({\hat{x}}_{R} | x_{E}), \end{matrix}

(A2)

where

(c): follows because

$\begin{matrix} p (x_{E}, x_{E^{c}}, {\hat{x}}_{R}) & = \sum_{i = 1}^{n} Pr {Q = i} p_{i} (x_{E}, x_{E^{c}}, {\hat{x}}_{R}), \end{matrix}$

(A3)
(d): is due to the Markov chain $X_{E^{c}, i}$ – $X_{E, i}$ – ${\hat{X}}_{R, i}$ ,
(e): follows from the stationary memoryless source, and
(f): follows because

$\begin{matrix} p ({\hat{x}}_{R} | x_{E}) & = \sum_{i = 1}^{n} Pr {Q = i} p_{i} ({\hat{x}}_{R} | x_{E}) . \end{matrix}$

(A4)

Therefore, we can obtain the Markov chain

X_{E^{c}}

–

X_{E}

–

{\hat{X}}_{R}

. We complete the proof.

Appendix B. Proof of Equation (56)

From

\tilde{A} (j) \subseteq B (j)

for

j = 1, 2, \dots, M_{n} - 1

,

\begin{matrix} Pr {X_{K}^{n} \in B (j)} = Pr {X_{K}^{n} \in \tilde{A} (j)} + Pr {X_{K}^{n} \in B (j) ∖ \tilde{A} (j)} . \end{matrix}

(A5)

If

x_{K}^{n} \in B (j) ∖ \tilde{A} (j)

, then

x_{E}^{n} \in T_{δ}^{n} (X_{E} | {\hat{x}}_{R}^{n} (j))

and

(x_{E}^{n}, x_{E^{c}}^{n}) \notin T_{2 δ}^{n} (X_{K} | {\hat{x}}_{R}^{n} (j))

, and thus we have

x_{E^{c}}^{n} \notin T_{δ}^{n} (X_{E^{c}} | x_{E}^{n}, {\hat{x}}_{R}^{n} (j))

from Lemma 5. Then,

\begin{matrix} x_{K}^{n} \in B (j) ∖ \tilde{A} (j) ⟹ & x_{E}^{n} \in T_{δ}^{n} (X_{E} | {\hat{x}}_{R}^{n} (j)), \\ x_{E^{c}}^{n} \notin T_{δ}^{n} (X_{E^{c}} | x_{E}^{n}, {\hat{x}}_{R}^{n} (j)) \end{matrix}

(A6)

We can prove that

\begin{matrix} Pr {X_{K}^{n} \in B (j) ∖ \tilde{A} (j)} \\ \leq Pr {X_{E}^{n} \in T_{δ}^{n} (X_{E} | {\hat{x}}_{R}^{n} (j)), X_{E^{c}}^{n} \notin T_{δ}^{n} (X_{E^{c}} | X_{E}^{n}, {\hat{x}}_{R}^{n} (j))} \\ = \sum_{x_{E}^{n} \in T_{δ}^{n} (X_{E} | {\hat{x}}_{R}^{n} (j))} Pr {X_{E}^{n} = x_{E}^{n}} \cdot \\ Pr {X_{E^{c}}^{n} \notin T_{δ}^{n} (X_{E^{c}} | x_{E}^{n}, {\hat{x}}_{R}^{n} (j)) | X_{E}^{n} = x_{E}^{n}} \\ \overset{(a)}{=} \sum_{x_{E}^{n} \in T_{δ}^{n} (X_{E} | {\hat{x}}_{R}^{n} (j))} Pr {X_{E}^{n} = x_{E}^{n}} \cdot \\ Pr {X_{E^{c}}^{n} \notin T_{δ}^{n} (X_{E^{c}} | x_{E}^{n}, {\hat{x}}_{R}^{n} (j)) | X_{E}^{n} = x_{E}^{n}, {\hat{X}}_{R}^{n} = {\hat{x}}_{R}^{n} (j)} \\ \overset{(b)}{\leq} \sum_{x_{E}^{n} \in T_{δ}^{n} (X_{E} | {\hat{x}}_{R}^{n} (j))} Pr {X_{E}^{n} = x_{E}^{n}} \cdot 2 | X_{E^{c}} | \cdot | X_{E} | \cdot | {\hat{X}}_{R} | e^{- 2 δ^{2} n} \\ \leq 2 | X_{K} | \cdot | {\hat{X}}_{R} | e^{- 2 δ^{2} n}, \end{matrix}

(A7)

where

(a): is due to the Markov chain $X_{E^{c}}^{n}$ – $X_{E}^{n}$ – ${\hat{X}}_{R}^{n}$ and
(b): follows from Lemma 6.

From Equations (A5) and (A7), we can obtain

\begin{matrix} | Pr {X_{K}^{n} \in B (j)} - Pr {X_{K}^{n} \in \tilde{A} (j)} | \leq 2 | X_{K} | \cdot | {\hat{X}}_{R} | e^{- 2 δ^{2} n} . \end{matrix}

(A8)

We complete the proof of (56).

Appendix C. Proof of Existence of Code Satisfying Equations (57)–(62)

We first set

M_{n} ≔ 2^{n R}

and

r_{n} ≔ \frac{1}{n} \log M_{n}

. Then, we obviously have (57).

From the union upper bound,

\begin{matrix} Pr \{X_{E}^{n} \notin ⋃_{j = 1}^{M_{n} - 1} A (j)\} \\ \leq Pr {X_{E}^{n} \notin T_{δ}^{n} (X_{E})} \\ + Pr {X_{E}^{n} \in T_{δ}^{n} (X_{E}), X_{E}^{n} \notin T_{δ}^{n} (X_{E} | {\hat{x}}_{R}^{n} (j)) \\ for all j = 1, 2, \dots, M_{n} - 1} . \end{matrix}

(A9)

From Lemma 6, the first term in (A9) is bounded as

\begin{matrix} Pr {X_{E}^{n} \notin T_{δ}^{n} (X_{E})} \leq 2 | X_{E} | e^{- 2 δ^{2} n} . \end{matrix}

(A10)

We consider the expectation of the second term in (A9) by random coding. Hereafter, we denote the random variable corresponding to the reproduced sequence

{\hat{x}}_{R}^{n} (j)

as

{\hat{X}}_{R}^{n} (j)

. For notational simplicity, we use the abbreviation

\begin{matrix} Pr {X_{E}^{n} \notin T_{δ}^{n} (X_{E} | {\hat{X}}_{R}^{n} (j)) \\ for all j = 1, 2, \dots, M_{n} - 1 | X_{E}^{n} = x_{E}^{n}} \\ = Pr {x_{E}^{n} \notin T_{δ}^{n} (X_{E} | {\hat{X}}_{R}^{n} (j)) \\ for all j = 1, 2, \dots, M_{n} - 1}, \end{matrix}

(A11)

and then

\begin{matrix} E [Pr {X_{E}^{n} \in T_{δ}^{n} (X_{E}), X_{E}^{n} \notin T_{δ}^{n} (X_{E} | {\hat{X}}_{R}^{n} (j)) \\ for all j = 1, 2, \dots, M_{n} - 1}] \\ = \sum_{x_{E}^{n} \in T_{δ}^{n} (X_{E})} p (x_{E}^{n}) E [Pr {X_{E}^{n} \notin T_{δ}^{n} (X_{E} | {\hat{X}}_{R}^{n} (j)) \\ for all j = 1, 2, \dots, M_{n} - 1 | X_{E}^{n} = x_{E}^{n}}] \\ \overset{(a)}{=} \sum_{x_{E}^{n} \in T_{δ}^{n} (X_{E})} p (x_{E}^{n}) E [Pr {x_{E}^{n} \notin T_{δ}^{n} (X_{E} | {\hat{X}}_{R}^{n} (j)) \\ for all j = 1, 2, \dots, M_{n} - 1}] \\ = \sum_{x_{E}^{n} \in T_{δ}^{n} (X_{E})} p (x_{E}^{n}) (\prod_{j = 1}^{M_{n} - 1} E [Pr {x_{E}^{n} \notin T_{δ}^{n} (X_{E} | {\hat{X}}_{R}^{n} (j))}]) \\ \overset{(b)}{=} \sum_{x_{E}^{n} \in T_{δ}^{n} (X_{E})} p (x_{E}^{n}) {(E [Pr {x_{E}^{n} \notin T_{δ}^{n} (X_{E} | {\hat{X}}_{R}^{n} (1))}])}^{M_{n} - 1} \\ \overset{(c)}{\leq} exp \{- 2^{n (R - I (X_{E}; {\hat{X}}_{R}) - \frac{1}{n} - τ)}\} \\ \overset{(d)}{\leq} exp \{- 2^{2 δ^{2} n}\}, \end{matrix}

(A12)

where

(a): is owing to (A11),
(b): is due to the symmetry about indexes of random coding,
(c): follows from the same way as in (Section 3.6.3 in [31]), and
(d): because $δ$ is fixed to satisfy (49).

From (A10) and (A12), we obtain

\begin{matrix} E [Pr \{X_{E}^{n} \notin ⋃_{j = 1}^{M_{n} - 1} A (j)\}] \leq (2 | X_{E} | + 1) e^{- 2 δ^{2} n} . \end{matrix}

(A13)

Therefore, there exists at least one codebook satisfying (60) in the ensembles obtained by random coding.

Hereafter, codebook

C

is fixed to satisfy (60). That is, codebook

C

satisfies

\begin{matrix} Pr \{X_{E}^{n} \notin ⋃_{j = 1}^{M_{n} - 1} A (j)\} \leq (2 | X_{E} | + 1) e^{- 2 δ^{2} n} . \end{matrix}

(A14)

We evaluate the distortion function for each j.

(i)

j = 1, 2, \dots, M_{n} - 1

:

\begin{matrix} d (x_{R}^{n}, {\hat{x}}_{R}^{n} (j)) \\ = \frac{1}{n} \sum_{a \in X_{R}} \sum_{b \in {\hat{X}}_{R}} N (a, b | x_{R}^{n}, {\hat{x}}_{R}^{n} (j)) d (a, b) \\ \overset{(e)}{\leq} \sum_{a \in X_{R}} \sum_{b \in {\hat{X}}_{R}} P_{X_{R}, {\hat{X}}_{R}} (a, b) d (a, b) \\ + (δ + δ_{1}) | X_{R} | \cdot | {\hat{X}}_{R} | D_{max} \\ = E [d (X_{R}, {\hat{X}}_{R})] + (δ + δ_{1}) | X_{R} | \cdot | {\hat{X}}_{R} | D_{max}, \end{matrix}

(A15)

where

(e): because from Lemma 4, if $x_{E}^{n} \in T_{δ}^{n} (X_{E} | {\hat{x}}_{R}^{n} (j))$ , then $x_{R}^{n} \in T_{δ_{1}}^{n} (X_{R} | {\hat{x}}_{R}^{n} (j))$ and from Lemma 3, if ${\hat{x}}_{R}^{n} (j) \in T_{δ}^{n} ({\hat{X}}_{R})$ and $x_{R}^{n} \in T_{δ_{1}}^{n} (X_{R} | {\hat{x}}_{R}^{n} (j))$ , then $(x_{R}^{n}, {\hat{x}}_{R}^{n} (j)) \in T_{δ + δ_{1}}^{n} (X_{R}, {\hat{X}}_{R})$ .

(ii)

j = M_{n}

:

\begin{matrix} d (x_{R}^{n}, {\hat{x}}_{R}^{n} (M_{n})) & = \frac{1}{n} \sum_{i = 1}^{n} d (x_{R, i}, {\hat{x}}_{R, i}) \\ \overset{(f)}{\leq} D_{max}, \end{matrix}

(A16)

where

(f): is due to the definition of $D_{max} ≔ max_{a \in X_{R}, b \in {\hat{X}}_{R}} d (a, b)$ .

We consider

Pr {J_{n} = M_{n}}

. From (A14),

\begin{matrix} Pr {J_{n} = M_{n}} & = Pr {X_{E}^{n} \in A (M_{n})} \\ = Pr \{X_{E}^{n} \notin ⋃_{j = 1}^{M_{n} - 1} A (j)\} \\ \leq (2 | X_{E} | + 1) e^{- 2 δ^{2} n} . \end{matrix}

(A17)

Therefore, we can confirm

\begin{matrix} lim_{n \to \infty} Pr {J_{n} = M_{n}} = 0 . \end{matrix}

(A18)

From (i) and (ii), we can evaluate utility

u_{n}

as below.

\begin{matrix} u_{n} & ≔ E [d (X_{R}^{n}, {\hat{X}}_{R}^{n})] \\ \leq \sum_{j = 1}^{M_{n} - 1} Pr {J_{n} = j} \cdot (E [d (X_{R}, {\hat{X}}_{R})] \\ + (δ + δ_{1}) | X_{R} | \cdot | {\hat{X}}_{R} | D_{max}) + Pr {J_{n} = M_{n}} \cdot D_{max} \\ \overset{(g)}{\leq} E [d (X_{R}, {\hat{X}}_{R})] + (δ + δ_{1}) | X_{R} | \cdot | {\hat{X}}_{R} | D_{max} + τ \end{matrix}

(A19)

for all sufficiently large n, where

(g): follows from (A18).

Thus, we obtain (58).

We can evaluate the privacy leakage against the encoder as shown below.

\begin{matrix} e_{n} & ≔ \frac{1}{n} I (X_{H}^{n}; X_{E}^{n}) \\ \overset{(h)}{=} \frac{1}{n} \sum_{i = 1}^{n} I (X_{H, j}; X_{E}^{n} | X_{H}^{j - 1}) \\ \overset{(i)}{=} \frac{1}{n} \sum_{i = 1}^{n} I (X_{H, j}; X_{E, j}) \\ \overset{(j)}{=} I (X_{H}; X_{E}), \end{matrix}

(A20)

where

(h): is due to chain rule for mutual information and
(i), (j): follows because $i . i . d . P_{X_{K}^{n}}$ .

Thus, we have (59).

Next, we show that the probability that random vector

X_{K}^{n}

is not included in the set

⋃_{j = 1}^{M_{n} - 1} \tilde{A} (j)

is sufficiently small. First, notice that

\begin{matrix} x_{K}^{n} \notin ⋃_{j = 1}^{M_{n} - 1} \tilde{A} (j) ⟹ & x_{E}^{n} \notin ⋃_{j = 1}^{M_{n} - 1} A (j) \\ or \\ x_{E}^{n} \in A (j_{0}), \\ (x_{E}^{n}, x_{E^{c}}^{n}) \notin T_{2 δ}^{n} (X_{K} | {\hat{x}}_{R}^{n} (j_{0})) \\ for j_{0} = f_{n} (x_{E}^{n}), \end{matrix}

(A21)

where

j_{0}

is the index such that

f_{n} (x_{E}^{n}) = j_{0}

for

1 \leq j_{0} \leq M_{n} - 1

. Therefore, by the union upper bound,

\begin{matrix} Pr \{X_{K}^{n} \notin ⋃_{j = 1}^{M_{n} - 1} \tilde{A} (j)\} \\ \leq Pr \{X_{E}^{n} \notin ⋃_{j = 1}^{M_{n} - 1} A (j)\} \\ + Pr {X_{E}^{n} \in A (j_{0}), (X_{E}^{n}, X_{E^{c}}^{n}) \notin T_{2 δ}^{n} (X_{K} | {\hat{x}}_{R}^{n} (j_{0})) \\ for j_{0} = f_{n} (X_{E}^{n})} . \end{matrix}

(A22)

We evaluate each term in (A22).

(i)

The first term:

\begin{matrix} Pr \{X_{E}^{n} \notin ⋃_{j = 1}^{M_{n} - 1} A (j)\} & \overset{(k)}{\leq} (2 | X_{E} | + 1) e^{- 2 δ^{2} n}, \end{matrix}

(A23)

where

(k): is because of (A14).

(ii)

The second term:

If the event in the second term occurs,

x_{E}^{n} \in T_{δ}^{n} (X_{E} | {\hat{x}}_{R}^{n} (j_{0}))

and

(x_{E}^{n}, x_{E^{c}}^{n}) \notin T_{2 δ}^{n} (X_{K} | {\hat{x}}_{R}^{n} (j_{0}))

. Therefore, from Lemma 5,

x_{E^{c}}^{n} \notin T_{δ}^{n} (X_{E^{c}} | x_{E}^{n}, {\hat{x}}_{R}^{n} (j_{0}))

holds. Hence,

\begin{matrix} Pr {X_{E}^{n} \in A (j_{0}), (X_{E}^{n}, X_{E^{c}}^{n}) \notin T_{2 δ}^{n} (X_{K} | {\hat{x}}_{R}^{n} (j_{0})) \\ for j_{0} = f_{n} (X_{E}^{n})} \\ \leq Pr {X_{E}^{n} \in A (j_{0}), X_{E^{c}}^{n} \notin T_{δ}^{n} (X_{E^{c}} | X_{E}^{n}, {\hat{x}}_{R}^{n} (j_{0}))} \\ \leq \sum_{j = 1}^{M_{n} - 1} \sum_{x_{E}^{n} \in A (j)} Pr {X_{E}^{n} = x_{E}^{n}} \cdot \\ Pr {X_{E^{c}}^{n} \notin T_{δ}^{n} (X_{E^{c}} | x_{E}^{n}, {\hat{x}}_{R}^{n} (j)) | X_{E}^{n} = x_{E}^{n}} \\ \overset{(l)}{=} \sum_{j = 1}^{M_{n} - 1} \sum_{x_{E}^{n} \in A (j)} Pr {X_{E}^{n} = x_{E}^{n}} \cdot \\ Pr {X_{E^{c}}^{n} \notin T_{δ}^{n} (X_{E^{c}} | x_{E}^{n}, {\hat{x}}_{R}^{n} (j)) | X_{E}^{n} = x_{E}^{n}, {\hat{X}}_{R}^{n} = {\hat{x}}_{R}^{n} (j)} \\ \overset{(m)}{\leq} \sum_{j = 1}^{M_{n} - 1} \sum_{x_{E}^{n} \in A (j)} Pr {X_{E}^{n} = x_{E}^{n}} \cdot 2 | X_{E^{c}} | \cdot | X_{E} | \cdot | {\hat{X}}_{R} | e^{- 2 δ^{2} n} \\ \overset{(n)}{\leq} 2 | X_{K} | \cdot | {\hat{X}}_{R} | e^{- 2 δ^{2} n}, \end{matrix}

(A24)

where

(l): is due to the Markov chain $X_{E^{c}}^{n} - X_{E}^{n} - {\hat{X}}_{R}^{n}$ ,
(m): follows since $x_{E}^{n} \in T_{δ}^{n} (X_{E} | {\hat{x}}_{R}^{n} (j_{0}))$ and Lemma 6, and
(n): follows because $A (j)$ are disjoint for each j.

From (A22)–(A24),

\begin{matrix} Pr \{X_{K}^{n} \notin ⋃_{j = 1}^{M_{n} - 1} \tilde{A} (j)\} \leq 4 | X_{K} | \cdot | {\hat{X}}_{R} | e^{- 2 δ^{2} n} . \end{matrix}

(A25)

Therefore, for sufficiently large n,

\begin{matrix} Pr \{X_{K}^{n} \notin ⋃_{j = 1}^{M_{n} - 1} \tilde{A} (j)\} \leq τ, \end{matrix}

(A26)

and we obtain (61).

From Lemma 1, for sufficiently large n to stochastic matrix

W : {\hat{X}}_{R} \to X_{K}

and

{\hat{x}}_{R}^{n} (j) \in T_{δ}^{n} ({\hat{X}}_{R})

we can show that

\begin{matrix} |\frac{1}{n} \log | T_{δ_{2}}^{n} (X_{K} | {\hat{x}}_{R}^{n} (j)) | - H (X_{K} | {\hat{X}}_{R})| \leq τ, \\ δ_{2} ≔ \frac{δ}{| X_{E^{c}} |} . \end{matrix}

(A27)

We can also show from (A27) that

\begin{matrix} 2^{n {H (X_{K} | {\hat{X}}_{R}) - τ}} \leq | T_{δ_{2}}^{n} (X_{K} | {\hat{x}}_{R}^{n} (j)) | \leq 2^{n {H (X_{K} | {\hat{X}}_{R}) + τ}} . \end{matrix}

(A28)

From the definition of

\tilde{A} (j)

and

T_{δ_{2}}^{n} (X_{K} | {\hat{x}}_{R}^{n} (j))

and Lemma 3, for

j = 1, 2, \dots, M_{n} - 1

, we have

\begin{matrix} x_{K}^{n} \in \tilde{A} (j) & ⟺ \{\begin{matrix} x_{E}^{n} \in T_{δ}^{n} (X_{E} | {\hat{x}}_{R}^{n} (j)) \\ x_{K}^{n} \in T_{2 δ}^{n} (X_{K} | {\hat{x}}_{R}^{n} (j)) \end{matrix} \end{matrix}

(A29)

\begin{matrix} x_{K}^{n} \in T_{δ_{2}}^{n} (X_{K} | {\hat{x}}_{R}^{n} (j)) & ⟹ \{\begin{matrix} x_{E}^{n} \in T_{δ}^{n} (X_{E} | {\hat{x}}_{R}^{n} (j)) \\ x_{K}^{n} \in T_{2 δ}^{n} (X_{K} | {\hat{x}}_{R}^{n} (j)) \end{matrix} \end{matrix}

(A30)

This means

\begin{matrix} T_{δ_{2}}^{n} (X_{K} | {\hat{x}}_{R}^{n} (j)) & \subseteq \tilde{A} (j) \\ ⟹ | T_{δ_{2}}^{n} (X_{K} | {\hat{x}}_{R}^{n} (j)) | & \leq | \tilde{A} (j) | . \end{matrix}

(A31)

Therefore, from (A28) and (A31),

\begin{matrix} | \tilde{A} (j) | \geq 2^{n {H (X_{K} | {\hat{X}}_{R}) - τ}}, \end{matrix}

(A32)

and we obtain (62).

Appendix D. Derivation of Inequality in Equation (63)

We derive the inequality in (63). To write notation concisely, for every

x_{H}^{n} \in X_{H}^{n}

and each

j = 1, 2, \dots, M_{n}

, we define

P_{n} (j)

,

Q_{n} (j)

,

{\tilde{P}}_{n} (x_{H}^{n}, j)

, and

{\tilde{Q}}_{n} (x_{H}^{n}, j)

as follows:

\begin{matrix} P_{n} (j) ≔ & Pr {X_{K}^{n} \in B (j)}, \end{matrix}

(A33)

\begin{matrix} Q_{n} (j) ≔ & Pr {X_{K}^{n} \in \tilde{A} (j)}, \end{matrix}

(A34)

\begin{matrix} {\tilde{P}}_{n} (x_{H}^{n}, j) ≔ & Pr {X_{H}^{n} = x_{H}^{n}, X_{K}^{n} \in B (j)}, \end{matrix}

(A35)

\begin{matrix} {\tilde{Q}}_{n} (x_{H}^{n}, j) ≔ & Pr {X_{H}^{n} = x_{H}^{n}, X_{K}^{n} \in \tilde{A} (j)} . \end{matrix}

(A36)

Then, using the notation in [5], we can write each entropy as

\begin{matrix} H (X_{K}^{n} \in B (J_{n})) & = H (P_{n}), \end{matrix}

(A37)

\begin{matrix} H (X_{K}^{n} \in \tilde{A} (J_{n})) & = H (Q_{n}), \end{matrix}

(A38)

\begin{matrix} H (X_{H}^{n}, X_{K}^{n} \in B (J_{n})) & = H ({\tilde{P}}_{n}), \end{matrix}

(A39)

\begin{matrix} H (X_{H}^{n}, X_{K}^{n} \in \tilde{A} (J_{n})) & = H ({\tilde{Q}}_{n}) . \end{matrix}

(A40)

The variational distance between distributions

P_{n}

and

Q_{n}

is

\begin{matrix} d_{v} (P_{n}, Q_{n}) & = \sum_{j = 1}^{M_{n}} | P_{n} (j) - Q_{n} (j) | \\ = \sum_{j = 1}^{M_{n} - 1} | P_{n} (j) - Q_{n} (j) | \\ + | P_{n} (M_{n}) - Q_{n} (M_{n}) | . \end{matrix}

(A41)

We evaluate each term in (A41).

(i)

The first term:

\begin{matrix} \sum_{j = 1}^{M_{n} - 1} | P_{n} (j) - Q_{n} (j) | \\ = \sum_{j = 1}^{M_{n} - 1} Pr {X_{K}^{n} \in B (j) ∖ \tilde{A} (j)} \\ \overset{(a)}{=} Pr \{X_{K}^{n} \in ⋃_{j = 1}^{M_{n} - 1} B (j) ∖ \tilde{A} (j)\} \\ = Pr \{X_{K}^{n} \in ⋃_{j = 1}^{M_{n} - 1} B (j)\} - Pr \{X_{K}^{n} \in ⋃_{j = 1}^{M_{n} - 1} \tilde{A} (j)\} \\ = (1 - Pr \{X_{K}^{n} \in ⋃_{j = 1}^{M_{n} - 1} \tilde{A} (j)\}) \\ - (1 - Pr \{X_{K}^{n} \in ⋃_{j = 1}^{M_{n} - 1} B (j)\}) \\ = Pr \{X_{K}^{n} \notin ⋃_{j = 1}^{M_{n} - 1} \tilde{A} (j)\} - Pr \{X_{K}^{n} \notin ⋃_{j = 1}^{M_{n} - 1} B (j)\} \\ \leq Pr \{X_{K}^{n} \notin ⋃_{j = 1}^{M_{n} - 1} \tilde{A} (j)\} \\ \overset{(b)}{\leq} τ, \end{matrix}

(A42)

where

(a): follows because $B (j) ∖ \tilde{A} (j)$ is disjoint for each $j = 1, 2, \dots, M_{n} - 1$ ,
(b): is owing to (61).

(ii)

The second term:

\begin{matrix} | P_{n} (M_{n}) - Q_{n} (M_{n}) | & \overset{(c)}{=} Q_{n} (M_{n}) - P_{n} (M_{n}) \\ \leq Q_{n} (M_{n}) \\ = Pr \{X_{K}^{n} \notin ⋃_{j = 1}^{M_{n} - 1} \tilde{A} (j)\} \\ \overset{(d)}{\leq} τ, \end{matrix}

(A43)

where

(c): follows because $B (M_{n}) \subseteq \tilde{A} (M_{n})$ and
(d): follows from (61).

From (A42) and (A43), the variational distance between

P_{n}

and

Q_{n}

is bounded from above as

\begin{matrix} d_{v} (P_{n}, Q_{n}) & \leq τ + τ \\ = 2 τ . \end{matrix}

(A44)

Next, the variational distance between distributions

{\tilde{P}}_{n}

and

{\tilde{Q}}_{n}

is

\begin{matrix} d_{v} ({\tilde{P}}_{n}, {\tilde{Q}}_{n}) & = \sum_{j = 1}^{M_{n}} \sum_{x_{H}^{n} \in X_{H}^{n}} |{\tilde{P}}_{n} (x_{H}^{n}, j) - {\tilde{Q}}_{n} (x_{H}^{n}, j)| \\ = \sum_{j = 1}^{M_{n} - 1} \sum_{x_{H}^{n} \in X_{H}^{n}} |{\tilde{P}}_{n} (x_{H}^{n}, j) - {\tilde{Q}}_{n} (x_{H}^{n}, j)| \\ + \sum_{x_{H}^{n} \in X_{H}^{n}} |{\tilde{P}}_{n} (x_{H}^{n}, M_{n}) - {\tilde{Q}}_{n} (x_{H}^{n}, M_{n})| . \end{matrix}

(A45)

We evaluate each term in (A45).

(i)

The first term:

\begin{matrix} \sum_{j = 1}^{M_{n} - 1} \sum_{x_{H}^{n} \in X_{H}^{n}} |{\tilde{P}}_{n} (x_{H}^{n}, j) - {\tilde{Q}}_{n} (x_{H}^{n}, j)| \\ = \sum_{j = 1}^{M_{n} - 1} \sum_{x_{H}^{n} \in X_{H}^{n}} Pr {X_{H}^{n} = x_{H}^{n}, X_{K}^{n} \in B (j) ∖ \tilde{A} (j)} \\ \overset{(e)}{=} \sum_{x_{H}^{n} \in X_{H}^{n}} Pr \{X_{H}^{n} = x_{H}^{n}, X_{K}^{n} \in ⋃_{j = 1}^{M_{n} - 1} B (j) ∖ \tilde{A} (j)\} \\ = Pr \{X_{K}^{n} \in ⋃_{j = 1}^{M_{n} - 1} B (j) ∖ \tilde{A} (j)\} \\ = Pr \{X_{K}^{n} \in ⋃_{j = 1}^{M_{n} - 1} B (j)\} - Pr \{X_{K}^{n} \in ⋃_{j = 1}^{M_{n} - 1} \tilde{A} (j)\} \\ = (1 - Pr \{X_{K}^{n} \in ⋃_{j = 1}^{M_{n} - 1} \tilde{A} (j)\}) \\ - (1 - Pr \{X_{K}^{n} \in ⋃_{j = 1}^{M_{n} - 1} B (j)\}) \\ = Pr \{X_{K}^{n} \notin ⋃_{j = 1}^{M_{n} - 1} \tilde{A} (j)\} - Pr \{X_{K}^{n} \notin ⋃_{j = 1}^{M_{n} - 1} B (j)\} \\ \leq Pr \{X_{K}^{n} \notin ⋃_{j = 1}^{M_{n} - 1} \tilde{A} (j)\} \\ \overset{(f)}{\leq} τ, \end{matrix}

(A46)

where

(e): follows since $B (j) ∖ \tilde{A} (j)$ is disjoint for each $j = 1, 2, \dots, M_{n} - 1$ ,
(f): is due to (61).

(ii)

The second term:

\begin{matrix} \sum_{x_{H}^{n} \in X_{H}^{n}} | {\tilde{P}}_{n} (x_{H}^{n}, M_{n}) - {\tilde{Q}}_{n} (x_{H}^{n}, M_{n}) | \\ \overset{(g)}{=} \sum_{x_{H}^{n} \in X_{H}^{n}} ({\tilde{Q}}_{n} (x_{H}^{n}, M_{n}) - {\tilde{P}}_{n} (x_{H}^{n}, M_{n})) \\ \leq \sum_{x_{H}^{n} \in X_{H}^{n}} {\tilde{Q}}_{n} (x_{H}^{n}, M_{n}) \\ = Q_{n} (M_{n}) \\ = Pr \{X_{K}^{n} \notin ⋃_{j = 1}^{M_{n} - 1} \tilde{A} (j)\} \\ \overset{(h)}{\leq} τ, \end{matrix}

(A47)

where

(g): follows because $B (M_{n}) \subseteq \tilde{A} (M_{n})$ and
(h): is due to (61).

From (A46) and (A47), the variational distance between

{\tilde{P}}_{n}

and

{\tilde{Q}}_{n}

is bounded from above as

\begin{matrix} d_{v} ({\tilde{P}}_{n}, {\tilde{Q}}_{n}) & \leq τ + τ \\ = 2 τ . \end{matrix}

(A48)

As a result, from Lemma 2 and the relation of each entropy,

\begin{matrix} | H (X_{K}^{n} \in B (J_{n})) - H (X_{K}^{n} \in \tilde{A} (J_{n})) | \leq - 2 τ \log \frac{2 τ}{M_{n}}, \\ | H (X_{H}^{n}, X_{K}^{n} \in B (J_{n})) - H (X_{H}^{n}, X_{K}^{n} \in \tilde{A} (J_{n})) | \end{matrix}

(A49)

\begin{matrix} \leq - 2 τ \log \frac{2 τ}{| X_{H} |^{n} \cdot M_{n}} . \end{matrix}

(A50)

From (A49), (A50), and the chain rule of entropy,

\begin{matrix} | H (X_{H}^{n} | X_{K}^{n} \in B (J_{n})) - H (X_{H}^{n} | X_{K}^{n} \in \tilde{A} (J_{n})) | \\ = | {H (X_{H}^{n}, X_{K}^{n} \in B (J_{n})) - H (X_{K}^{n} \in B (J_{n}))} \\ - {H (X_{H}^{n}, X_{K}^{n} \in \tilde{A} (J_{n})) - H (X_{K}^{n} \in \tilde{A} (J_{n}))} | \\ = | {H (X_{H}^{n}, X_{K}^{n} \in B (J_{n})) - H (X_{H}^{n}, X_{K}^{n} \in \tilde{A} (J_{n}))} \\ + {H (X_{K}^{n} \in \tilde{A} (J_{n})) - H (X_{K}^{n} \in B (J_{n}))} | \\ \overset{(i)}{\leq} | H (X_{H}^{n}, X_{K}^{n} \in B (J_{n})) - H (X_{H}^{n}, X_{K}^{n} \in \tilde{A} (J_{n})) | \\ + | H (X_{K}^{n} \in \tilde{A} (J_{n})) - H (X_{K}^{n} \in B (J_{n})) | \\ \leq - 2 τ \log \frac{2 τ}{M_{n}} - 2 τ \log \frac{2 τ}{| X_{H} |^{n} \cdot M_{n}} \\ = - 4 τ \log \frac{2 τ}{| X_{H} |^{n} \cdot M_{n}} \\ = 4 τ \log \frac{| X_{H} |^{n} \cdot M_{n}}{2 τ}, \end{matrix}

(A51)

where

(i): is because of the triangle inequality.

Therefore, we obtain

\begin{matrix} \frac{1}{n} H (X_{H}^{n} | J_{n}) & = \frac{1}{n} H (X_{H}^{n} | X_{K}^{n} \in B (J_{n})) \\ \geq \frac{1}{n} H (X_{H}^{n} | X_{K}^{n} \in \tilde{A} (J_{n})) - \frac{4 τ}{n} \log \frac{| X_{H} |^{n} \cdot M_{n}}{2 τ} \\ \overset{(j)}{>} \frac{1}{n} H (X_{H}^{n} | X_{K}^{n} \in \tilde{A} (J_{n})) - \frac{4 τ}{n} \log \frac{| X_{H} |^{n} \cdot 2^{n R}}{{(2 τ)}^{n}} \\ = \frac{1}{n} H (X_{H}^{n} | X_{K}^{n} \in \tilde{A} (J_{n})) - 4 τ \log \frac{| X_{H} | \cdot 2^{R}}{2 τ}, \end{matrix}

(A52)

where

(j): follows from the definition that $M_{n} = 2^{n R}$ and $2 τ < 1$ .

We complete the derivation of (63).

Appendix E. Proof of Equation (65)

First of all, we shall show

\begin{matrix} x_{K}^{n} \in \tilde{A} (j) ⟹ x_{R}^{n} \in T_{δ_{3}}^{n} (X_{R} | x_{H}^{n}, {\hat{x}}_{R}^{n} (j)), \\ δ_{3} ≔ (| X_{H} | + 1) \cdot 2 δ . \end{matrix}

(A53)

By the definition of

\tilde{A} (j)

,

\begin{matrix} \tilde{A} (j) \subseteq T_{2 δ}^{n} (X_{K} | {\hat{x}}_{R}^{n} (j)) for j = 1, 2, \dots, M_{n} - 1 . \end{matrix}

(A54)

Thus, from Lemma 4, any

x_{R}^{n}

such that

(x_{R}^{n}, x_{H}^{n}) \in \tilde{A} (j)

satisfies

\begin{matrix} x_{R}^{n} \in T_{δ_{3}}^{n} (X_{R} | x_{H}^{n}, {\hat{x}}_{R}^{n} (j)) . \end{matrix}

(A55)

That is, given

x_{H}^{n} \in X_{H}^{n}

and

{\hat{x}}_{R}^{n} (j) \in {\hat{X}}_{R}^{n}

,

x_{R}^{n} \in X_{R}^{n}

and

x_{K}^{n} = (x_{R}^{n}, x_{H}^{n}) \in \tilde{A} (j)

are conditional strongly typical sequences. Then, we obtain (A53), and

\begin{matrix} \sum_{x_{R}^{n} : (x_{R}^{n}, x_{H}^{n}) \in \tilde{A} (j)} Pr {X_{R}^{n} = x_{R}^{n} | X_{H}^{n} = x_{H}^{n}} Pr {X_{H}^{n} = x_{H}^{n}} \\ \leq & \sum_{x_{R}^{n} \in T_{δ_{3}}^{n} (X_{R} | x_{H}^{n}, {\hat{x}}_{R}^{n} (j))} Pr {X_{R}^{n} = x_{R}^{n} | X_{H}^{n} = x_{H}^{n}} Pr {X_{H}^{n} = x_{H}^{n}} . \end{matrix}

(A56)

Therefore, we obtain (65).

Appendix F. Proof of the Existence of Code Satisfying Equations (111)–(116)

We first set

M_{n} ≔ 2^{n R}

and

r_{n} ≔ \frac{1}{n} \log M_{n}

. Then, we obviously have (111).

From the union upper bound,

\begin{matrix} Pr \{X_{E}^{n} \notin ⋃_{j = 1}^{M_{n} - 1} A (j)\} \\ \leq Pr {X_{E}^{n} \notin T_{δ}^{n} (X_{E})} \\ + Pr {X_{E}^{n} \in T_{δ}^{n} (X_{E}), X_{E}^{n} \notin T_{δ}^{n} (X_{E} | {\hat{x}}_{R}^{n} (j)) \\ for all j = 1, 2, \dots, M_{n} - 1} . \end{matrix}

(A57)

From Lemma 6, the first term in (A57) is bounded as

\begin{matrix} Pr {X_{E}^{n} \notin T_{δ}^{n} (X_{E})} \leq 2 | X_{E} | e^{- 2 δ^{2} n} . \end{matrix}

(A58)

We consider the expectation of the second term in (A57) by random coding. Hereafter, we denote the random variable corresponding to the reproduced sequence

{\hat{x}}_{R}^{n} (j)

as

{\hat{X}}_{R}^{n} (j)

. For notational simplicity, we use the abbreviation

\begin{matrix} Pr {X_{E}^{n} \notin T_{δ}^{n} (X_{E} | {\hat{X}}_{R}^{n} (j)) \\ for all j = 1, 2, \dots, M_{n} - 1 | X_{E}^{n} = x_{E}^{n}} \\ = Pr {x_{E}^{n} \notin T_{δ}^{n} (X_{E} | {\hat{X}}_{R}^{n} (j)) \\ for all j = 1, 2, \dots, M_{n} - 1}, \end{matrix}

(A59)

and then

\begin{matrix} E [Pr {X_{E}^{n} \in T_{δ}^{n} (X_{E}), X_{E}^{n} \notin T_{δ}^{n} (X_{E} | {\hat{X}}_{R}^{n} (j)) \\ for all j = 1, 2, \dots, M_{n} - 1}] \\ = \sum_{x_{E}^{n} \in T_{δ}^{n} (X_{E})} p (x_{E}^{n}) E [Pr {X_{E}^{n} \notin T_{δ}^{n} (X_{E} | {\hat{X}}_{R}^{n} (j)) \\ for all j = 1, 2, \dots, M_{n} - 1 | X_{E}^{n} = x_{E}^{n}}] \\ \overset{(a)}{=} \sum_{x_{E}^{n} \in T_{δ}^{n} (X_{E})} p (x_{E}^{n}) E [Pr {x_{E}^{n} \notin T_{δ}^{n} (X_{E} | {\hat{X}}_{R}^{n} (j)) \\ for all j = 1, 2, \dots, M_{n} - 1}] \\ = \sum_{x_{E}^{n} \in T_{δ}^{n} (X_{E})} p (x_{E}^{n}) (\prod_{j = 1}^{M_{n} - 1} E [Pr {x_{E}^{n} \notin T_{δ}^{n} (X_{E} | {\hat{X}}_{R}^{n} (j))}]) \\ \overset{(b)}{=} \sum_{x_{E}^{n} \in T_{δ}^{n} (X_{E})} p (x_{E}^{n}) {(E [Pr {x_{E}^{n} \notin T_{δ}^{n} (X_{E} | {\hat{X}}_{R}^{n} (1))}])}^{M_{n} - 1} \\ \overset{(c)}{\leq} exp \{- 2^{n (R - I (X_{E}; {\hat{X}}_{R}) - \frac{1}{n} - τ)}\} \\ \overset{(d)}{\leq} exp \{- 2^{2 δ^{2} n}\}, \end{matrix}

(A60)

where

(a): is owing to (A59),
(b): is due to the symmetry about indexes of random coding,
(c): follows from the same way as in ([31], Section 3.6.3), and
(d): because $δ$ is fixed to satisfy (103).

From (A58) and (A60), we obtain

\begin{matrix} E [Pr \{X_{E}^{n} \notin ⋃_{j = 1}^{M_{n} - 1} A (j)\}] \leq (2 | X_{E} | + 1) e^{- 2 δ^{2} n} . \end{matrix}

(A61)

Therefore, there exists at least one codebook satisfying (112) in the ensembles obtained using random coding.

Hereafter, codebook

C

is fixed to satisfy (112). That is, codebook

C

satisfies

\begin{matrix} Pr \{X_{E}^{n} \notin ⋃_{j = 1}^{M_{n} - 1} A (j)\} \leq (2 | X_{E} | + 1) e^{- 2 δ^{2} n} . \end{matrix}

(A62)

For a fixed codebook

C

, we divide the sequences

x_{E}^{n} \in X_{E}^{n}

into three categories.

Strongly typical sequences $x_{E}^{n} \in T_{δ}^{n} (X_{E})$ such that there exists a codeword ${\hat{X}}_{R}^{n} (j_{o})$ for some $j_{o} = 1, 2, \dots, M_{n} - 1$ that is conditionally strongly typical with $x_{E}^{n} .$ In this case, from Lemma 3, $(x_{E}, {\hat{x}}_{R}^{n} (j_{o})) \in T_{2 δ}^{n} (X_{E}, {\hat{X}}_{R} (j_{o}))$ . Since the codeword is jointly strongly typical with $x_{E}^{n}$ , the continuity of the distortion as a function of the joint distribution ensures that they are also typical distortions (see [2], Chapters 10.5 and 10.6). Hence, the distortion between these $x_{E}^{n}$ and their codewords is bounded by $D + δ^{'}$ where $δ^{'}$ goes to 0 as $n \to \infty$ . In the first-order analysis, that is, $n \to \infty$ , we can regard $D + δ^{'}$ as D.
Strongly typical sequences $x_{E}^{n} \in T_{δ}^{n} (X_{E})$ such that $f_{n} (x_{E}^{n}) = M_{n}$ .
Non-strongly typical sequences $x_{E}^{n} \notin T_{δ}^{n} (X_{E})$ .

The sequences in the second and third categories are encoded as

f_{n} (x_{E}^{n}) = M_{n} .

The sequences of third categories are the sequences that can be bounded by such the distortion

d_{max}

as in excess of D. Then, the excess-distortion probability is evaluated as

\begin{matrix} Pr \{\frac{1}{n} d (X_{R}^{n}, {\hat{X}}_{R}^{n}) > D\} & < Pr \{X_{E}^{n} \in A (M_{n})\} \end{matrix}

(A63)

\begin{matrix} = Pr \{X_{E}^{n} \notin ⋃_{j = 1}^{M_{n} - 1} A (j)\} \end{matrix}

(A64)

\begin{matrix} \leq (2 | X_{E} | + 1) e^{- 2 δ^{2} n} . \end{matrix}

(A65)

Hence, for an appropriate choice of

ϵ

and n, we can ensure the excess-distortion probability of all badly represented sequences are as small as we want. We obtain (113).

We can evaluate privacy leakage against the encoder as below.

\begin{matrix} e_{n} & ≔ \frac{1}{n} I (X_{H}^{n}; X_{E}^{n}) \\ \overset{(e)}{=} \frac{1}{n} \sum_{i = 1}^{n} I (X_{H, j}; X_{E}^{n} | X_{H}^{j - 1}) \\ \overset{(f)}{=} \frac{1}{n} \sum_{i = 1}^{n} I (X_{H, j}; X_{E, j}) \\ \overset{(g)}{=} I (X_{H}; X_{E}), \end{matrix}

(A66)

where

(e): is due to chain rule for mutual information and
(f), (g): follows because $i . i . d . P_{X_{K}^{n}}$ .

Thus, we have (114).

Next, we show that the probability that random vector

X_{K}^{n}

is not included in the set

⋃_{j = 1}^{M_{n} - 1} \tilde{A} (j)

and is sufficiently small. First, notice that

\begin{matrix} x_{K}^{n} \notin ⋃_{j = 1}^{M_{n} - 1} \tilde{A} (j) ⟹ & x_{E}^{n} \notin ⋃_{j = 1}^{M_{n} - 1} A (j) \\ or \\ x_{E}^{n} \in A (j_{0}), \\ (x_{E}^{n}, x_{E^{c}}^{n}) \notin T_{2 δ}^{n} (X_{K} | {\hat{x}}_{R}^{n} (j_{0})) \\ for j_{0} = f_{n} (x_{E}^{n}), \end{matrix}

(A67)

where

j_{0}

is the index such that

f_{n} (x_{E}^{n}) = j_{0}

for

1 \leq j_{0} \leq M_{n} - 1

. Therefore, by the union’s upper bound,

\begin{matrix} Pr \{X_{K}^{n} \notin ⋃_{j = 1}^{M_{n} - 1} \tilde{A} (j)\} \\ \leq Pr \{X_{E}^{n} \notin ⋃_{j = 1}^{M_{n} - 1} A (j)\} \\ + Pr {X_{E}^{n} \in A (j_{0}), (X_{E}^{n}, X_{E^{c}}^{n}) \notin T_{2 δ}^{n} (X_{K} | {\hat{x}}_{R}^{n} (j_{0})) \\ for j_{0} = f_{n} (X_{E}^{n})} . \end{matrix}

(A68)

We evaluate each term in (A68).

(i)

The first term:

\begin{matrix} Pr \{X_{E}^{n} \notin ⋃_{j = 1}^{M_{n} - 1} A (j)\} & \overset{(h)}{\leq} (2 | X_{E} | + 1) e^{- 2 δ^{2} n}, \end{matrix}

(A69)

where

(h): is because of (A62).

(ii)

The second term:

If the event in the second term occurs,

x_{E}^{n} \in T_{δ}^{n} (X_{E} | {\hat{x}}_{R}^{n} (j_{0}))

and

(x_{E}^{n}, x_{E^{c}}^{n}) \notin T_{2 δ}^{n} (X_{K} | {\hat{x}}_{R}^{n} (j_{0}))

. Therefore, from Lemma 5,

x_{E^{c}}^{n} \notin T_{δ}^{n} (X_{E^{c}} | x_{E}^{n}, {\hat{x}}_{R}^{n} (j_{0}))

holds. Hence,

\begin{matrix} Pr {X_{E}^{n} \in A (j_{0}), (X_{E}^{n}, X_{E^{c}}^{n}) \notin T_{2 δ}^{n} (X_{K} | {\hat{x}}_{R}^{n} (j_{0})) \\ for j_{0} = f_{n} (X_{E}^{n})} \\ \leq Pr {X_{E}^{n} \in A (j_{0}), X_{E^{c}}^{n} \notin T_{δ}^{n} (X_{E^{c}} | X_{E}^{n}, {\hat{x}}_{R}^{n} (j_{0}))} \\ \leq \sum_{j = 1}^{M_{n} - 1} \sum_{x_{E}^{n} \in A (j)} Pr {X_{E}^{n} = x_{E}^{n}} \cdot \\ Pr {X_{E^{c}}^{n} \notin T_{δ}^{n} (X_{E^{c}} | x_{E}^{n}, {\hat{x}}_{R}^{n} (j)) | X_{E}^{n} = x_{E}^{n}} \\ \overset{(i)}{=} \sum_{j = 1}^{M_{n} - 1} \sum_{x_{E}^{n} \in A (j)} Pr {X_{E}^{n} = x_{E}^{n}} \cdot \\ Pr {X_{E^{c}}^{n} \notin T_{δ}^{n} (X_{E^{c}} | x_{E}^{n}, {\hat{x}}_{R}^{n} (j)) | X_{E}^{n} = x_{E}^{n}, {\hat{X}}_{R}^{n} = {\hat{x}}_{R}^{n} (j)} \\ \overset{(j)}{\leq} \sum_{j = 1}^{M_{n} - 1} \sum_{x_{E}^{n} \in A (j)} Pr {X_{E}^{n} = x_{E}^{n}} \cdot 2 | X_{E^{c}} | \cdot | X_{E} | \cdot | {\hat{X}}_{R} | e^{- 2 δ^{2} n} \\ \overset{(k)}{\leq} 2 | X_{K} | \cdot | {\hat{X}}_{R} | e^{- 2 δ^{2} n}, \end{matrix}

(A70)

where

(i): is due to the Markov chain $X_{E^{c}}^{n} - X_{E}^{n} - {\hat{X}}_{R}^{n}$ ,
(j): follows since $x_{E}^{n} \in T_{δ}^{n} (X_{E} | {\hat{x}}_{R}^{n} (j_{0}))$ and Lemma 6,
(k): follows because $A (j)$ is disjoint for each j.

From (A68)–(A70),

\begin{matrix} Pr \{X_{K}^{n} \notin ⋃_{j = 1}^{M_{n} - 1} \tilde{A} (j)\} \leq 4 | X_{K} | \cdot | {\hat{X}}_{R} | e^{- 2 δ^{2} n} . \end{matrix}

(A71)

Therefore, for sufficiently large n,

\begin{matrix} Pr \{X_{K}^{n} \notin ⋃_{j = 1}^{M_{n} - 1} \tilde{A} (j)\} \leq τ, \end{matrix}

(A72)

and we obtain (115).

From Lemma 1, for sufficiently large n to stochastic matrix

W : {\hat{X}}_{R} \to X_{K}

and

{\hat{x}}_{R}^{n} (j) \in T_{δ}^{n} ({\hat{X}}_{R})

we can show that

\begin{matrix} |\frac{1}{n} \log | T_{δ_{2}}^{n} (X_{K} | {\hat{x}}_{R}^{n} (j)) | - H (X_{K} | {\hat{X}}_{R})| \leq τ, \\ δ_{2} ≔ \frac{δ}{| X_{E^{c}} |} . \end{matrix}

(A73)

We can also show from (A73) that

\begin{matrix} 2^{n {H (X_{K} | {\hat{X}}_{R}) - τ}} \leq | T_{δ_{2}}^{n} (X_{K} | {\hat{x}}_{R}^{n} (j)) | \leq 2^{n {H (X_{K} | {\hat{X}}_{R}) + τ}} . \end{matrix}

(A74)

From the definition of

\tilde{A} (j)

and

T_{δ_{2}}^{n} (X_{K} | {\hat{x}}_{R}^{n} (j))

and Lemma 4, for

j = 1, 2, \dots, M_{n} - 1

, we have

\begin{matrix} x_{K}^{n} \in \tilde{A} (j) & ⟺ \{\begin{matrix} x_{E}^{n} \in T_{δ}^{n} (X_{E} | {\hat{x}}_{R}^{n} (j)) \\ x_{K}^{n} \in T_{2 δ}^{n} (X_{K} | {\hat{x}}_{R}^{n} (j)) \end{matrix} \end{matrix}

(A75)

\begin{matrix} x_{K}^{n} \in T_{δ_{2}}^{n} (X_{K} | {\hat{x}}_{R}^{n} (j)) & ⟹ \{\begin{matrix} x_{E}^{n} \in T_{δ}^{n} (X_{E} | {\hat{x}}_{R}^{n} (j)) \\ x_{K}^{n} \in T_{2 δ}^{n} (X_{K} | {\hat{x}}_{R}^{n} (j)) \end{matrix} \end{matrix}

(A76)

This means

\begin{matrix} T_{δ_{2}}^{n} (X_{K} | {\hat{x}}_{R}^{n} (j)) & \subseteq \tilde{A} (j) \\ ⟹ | T_{δ_{2}}^{n} (X_{K} | {\hat{x}}_{R}^{n} (j)) | & \leq | \tilde{A} (j) | . \end{matrix}

(A77)

Therefore, from (A74) and (A77),

\begin{matrix} | \tilde{A} (j) | \geq 2^{n {H (X_{K} | {\hat{X}}_{R}) - τ}}, \end{matrix}

(A78)

and we obtain (116).

References

Yamamoto, H. A source coding problem for sources with additional outputs to keep secret from the receiver or wiretappers. IEEE Trans. Inf. Theory 1983, 29, 918–923. [Google Scholar] [CrossRef] [Green Version]
Sankar, L.; Rajagopalan, S.R.; Poor, H.V. Utility–Privacy tradeoff in databases: An information-theoretic approach. IEEE Trans. Inf. Forensics Secur. 2013, 8, 838–852. [Google Scholar] [CrossRef] [Green Version]
Shinohara, N.; Yagi, H. Unified expression of utility–privacy trade-off in privacy-constrained source coding. In Proceedings of the 2022 International Symposium on Information Theory and Its Applications (ISITA2022), Tsukuba, Japan, 17–19 October 2022; pp. 198–202. [Google Scholar]
Ingber, A.; Kochman, Y. The dispersion of lossy source coding. In Proceedings of the 2011 Data Compression Conference, Snowbird, UT, USA, 29–31 March 2011; pp. 53–62. [Google Scholar]
Kostina, V.; Verdú, S. Fixed length lossy compression in the finite blocklength regime: Discrete memoryless sources. IEEE Trans. Inf. Theory 2012, 58, 3309–3338. [Google Scholar] [CrossRef] [Green Version]
Watanabe, S. Second-order region for Gray-Wyner network. IEEE Trans. Inf. Theory 2017, 63, 1006–1018. [Google Scholar] [CrossRef] [Green Version]
Tyagi, H.; Watanabe, S. Strong converse using change of measure arguments. IEEE Trans. Inf. Theory 2020, 66, 689–703. [Google Scholar] [CrossRef] [Green Version]
Dwork, C.; McSherry, F.; Nissim, K.; Smith, A. Calibrating noise to sensitivity in private data analysis. In Proceedings of the 3rd Conference Theory Cryptograph (TCC), New York, NY, USA, 4–7 March 2006; pp. 265–284. [Google Scholar]
Dwork, C. Differential privacy. In Proceedings of the 33rd International Conference Automata, Languages and Programming (ICALP), Venice, Italy, 10–14 July 2006; pp. 1–12. [Google Scholar]
Soria-Comas, J.; Domingo-Ferrer, J.; Sánchez, D.; Megías, D. Individual differential privacy: A utility-preserving formulation of differential privacy guarantees. IEEE Trans. Inf. Forensics Secur. 2017, 12, 1418–1429. [Google Scholar] [CrossRef] [Green Version]
Kalantari, K.; Sankar, L.; Sarwate, A.D. Robust privacy-utility tradeoffs under differential privacy and hamming distortion. IEEE Trans. Inf. Forensics Secur. 2018, 13, 2816–2830. [Google Scholar] [CrossRef] [Green Version]
Makhdoumi, A.; Fawaz, N. Privacy-utility tradeoff under statistical uncertainty. In Proceedings of the 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 2–4 October 2013; pp. 1627–1634. [Google Scholar]
Basciftci, Y.O.; Wang, Y.; Ishwar, P. On privacy-utility tradeoffs for constrained data release mechanisms. In Proceedings of the 2016 Information Theory and Applications Workshop (ITA), La Jolla, CA, USA, 31 January–5 February 2016; pp. 1–6. [Google Scholar]
Günlü, O.; Schaefer, R.F.; Boche, H.; Poor, H.V. Secure and private source coding with private key and decoder side information. In Proceedings of the 2022 IEEE Information Theory Workshop (ITW), Mumbai, India, 6–9 November 2022; pp. 226–231. [Google Scholar]
Issa, I.; Wagner, A.B.; Kamath, S. An operational approach to information leakage. IEEE Trans. Inf. Theory 2020, 66, 1625–1657. [Google Scholar] [CrossRef] [Green Version]
Liao, J.; Kosut, O.; Sankar, L.; Calmon, F.P. Privacy under hard distortion constraints. In Proceedings of the 2018 IEEE Information Theory Workshop (ITW2018), Guangzhou, China, 25–29 November 2018; pp. 1–5. [Google Scholar]
Liao, J.; Kosut, O.; Sankar, L.; Calmon, F.P. Tunable measures for information leakage and applications to privacy-utility tradeoffs. IEEE Trans. Inf. Theory 2019, 65, 8043–8066. [Google Scholar] [CrossRef] [Green Version]
Saeidian, S.; Cervia, G.; Oechtering, T.J.; Skoglund, M. Quantifying membership privacy via information leakage. IEEE Trans. Inf. Forensics Secur. 2020, 16, 3096–3108. [Google Scholar] [CrossRef]
Rassouli, B.; Gündüz, D. Optimal utility–privacy trade-off with total variation distance as a privacy measure. IEEE Trans. Inf. Forensics Secur. 2019, 15, 594–603. [Google Scholar] [CrossRef] [Green Version]
Wang, W.; Ying, L.; Zhang, J. On the relation between identifiability, differential privacy, and mutual-information privacy. IEEE Trans. Inf. Theory 2016, 62, 5018–5029. [Google Scholar] [CrossRef] [Green Version]
Liao, J.; Sankar, L.; Kosut, O.; Calmon, F.P. Maximal α-leakage and its properties. In Proceedings of the 2020 IEEE Conference on Communications and Network Security (CNS), Virtual, 29 June–1 July 2020; pp. 1–6. [Google Scholar]
Shinohara, N.; Yagi, H. Strong converse theorem for utility–privacy trade-offs. In Proceedings of the 45th Symposium on Information Theory and Its Applications (SITA2022), Noboribetsu, Japan, 29 November–2 December 2022; pp. 338–343. [Google Scholar]
Guan, Z.; Si, G.; Wu, J.; Zhu, L.; Zhang, Z.; Ma, Y. Utility–privacy tradeoff based on random data obfuscation in internet of energy. IEEE Access 2017, 5, 3250–3262. [Google Scholar] [CrossRef]
Asikis, T.; Pournaras, E. Optimization of privacy-utility trade-offs under informational self-determination. Future Gener. Comput. Syst. 2020, 109, 488–499. [Google Scholar] [CrossRef]
Lu, J.; Xu, Y.; Zhu, Z. On scalable source coding problem with side information privacy. In Proceedings of the 2022 14th International Conference on Wireless Communications and Signal Processing (WCSP), Nanjing, China, 1–3 November 2022; pp. 415–420. [Google Scholar]
Makhdoumi, A.; Salamatian, S.; Fawaz, N.; Médard, M. From the information bottleneck to the privacy funnel. In Proceedings of the 2014 IEEE Information Theory Workshop (ITW), Hobart, Australia, 2–5 November 2014; pp. 501–505. [Google Scholar]
Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; John & Wiley Sons, Inc.: Hoboken, NJ, USA, 2006. [Google Scholar]
Uyematsu, T. Gendai Shannon Riron, 1st ed.; Baihukan: Tokyo, Japan, 1998. (In Japanese) [Google Scholar]
Csizar, L.; Korner, J. Information Theory: Coding Theorems for Discrete Memoryless Systems, 2nd ed.; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
Sason, I.; Verdú, S. Improved bounds on lossless source coding and guessing moments via Rényi measures. IEEE Trans. Inf. Theory 2018, 64, 4323–4346. [Google Scholar] [CrossRef] [Green Version]
El Gamal, A.; Kim, Y.H. Network Information Theory, 1st ed.; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]

Figure 1. Privacy-constrained coding system.

Figure 2. Road map for second-order rate analysis [1,3,8].

Figure 3. The region expressed with a tangent plane using the Legendre transformation.

Figure 4. Utility–privacy trade-off region in cases (i), (ii), and (iii).

Figure 5. Utility–coding-rate trade-off region in cases (i), (ii), and (iii). The curves coincide in all cases.

Figure 6. Hasse diagram that represents the inclusion relation for the index sets of attributes.

Figure 7. Hasse diagram for Step 1.

Figure 8. Hasse diagram for Step 2.

Figure 9. Hasse diagram obtained after Step 2.

Table 1. Minimum L and its corresponding R for

D = 0.0500

.

Table 1. Minimum L and its corresponding R for

D = 0.0500

.

Cases	Leakage L	Coding Rate R
case (ii)	0.019512	0.494629
case (iii)	0.008298	0.527700
case (i)	0.005107	0.539478

Table 2. Minimum L and its corresponding R for

D = 0.100

.

Table 2. Minimum L and its corresponding R for

D = 0.100

.

Cases	Leakage L	Coding Rate R
case (ii)	0.015378	0.368062
case (iii)	0.002656	0.418826
case (i)	0.000000	0.429490

Table 3. Minimum L and its corresponding R for

D = 0.1500

.

Table 3. Minimum L and its corresponding R for

D = 0.1500

.

Cases	Leakage L	Coding Rate R
case (ii)	0.011748	0.270436
case (iii)	0.002032	0.294424
case (i)	0.000000	0.382211

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shinohara, N.; Yagi, H. Utility–Privacy Trade-Offs with Limited Leakage for Encoder. Entropy 2023, 25, 921. https://doi.org/10.3390/e25060921

AMA Style

Shinohara N, Yagi H. Utility–Privacy Trade-Offs with Limited Leakage for Encoder. Entropy. 2023; 25(6):921. https://doi.org/10.3390/e25060921

Chicago/Turabian Style

Shinohara, Naruki, and Hideki Yagi. 2023. "Utility–Privacy Trade-Offs with Limited Leakage for Encoder" Entropy 25, no. 6: 921. https://doi.org/10.3390/e25060921

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Utility–Privacy Trade-Offs with Limited Leakage for Encoder

Abstract

1. Introduction

1.1. Background

1.2. Motivation and Contributions

1.3. Related Work

1.4. Organization

2. Notation and System Model

2.1. Information Source

2.2. Encoder and Decoder

3. First-Order Rate Analysis with Expected Distortion

3.1. Performance Measures

3.2. Achievable Region and Theorem

3.3. Proof Preliminaries for First-Order Rate Analysis

3.4. Proof of Converse Part

3.5. Proof of Direct Part

4. First-Order Rate Analysis with Excess-Distortion Probability

4.1. Performance Measures

4.2. Achievable Region and Theorem

4.3. Proof of Converse Part

4.4. Proof of the Direct Part

5. Strong Converse Theorem for Utility–Privacy Trade-Offs

5.1. Another Expression of the Achievable Region

5.2. Proof Preliminaries

5.3. Strong Converse Theorem

5.4. Proof of Lemma 7

5.5. Proof of Strong Converse Theorem

6. Discussion

6.1. Numerical Calculation of Coding Rate, Utility, and Privacy for Decoder

6.2. Significance of Limited Leakage for Encoder

6.3. Discussion on Measures for Privacy Leakage

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Proof of the Markov Chain X E c – X E – X ^ R in Converse Part of Theorem 1

Appendix B. Proof of Equation (56)

Appendix C. Proof of Existence of Code Satisfying Equations (57)–(62)

Appendix D. Derivation of Inequality in Equation (63)

Appendix E. Proof of Equation (65)

Appendix F. Proof of the Existence of Code Satisfying Equations (111)–(116)

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Appendix A. Proof of the Markov Chain $X_{E^{c}}$ – $X_{E}$ – ${\hat{X}}_{R}$ in Converse Part of Theorem 1