Mechanisms for Robust Local Differential Privacy

Lopuhaä-Zwakenberg, Milan; Goseling, Jasper

doi:10.3390/e26030233

Open AccessArticle

Mechanisms for Robust Local Differential Privacy

by

Milan Lopuhaä-Zwakenberg

^*

and

Jasper Goseling

^*

Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, 7522 NB Enschede, The Netherlands

^*

Authors to whom correspondence should be addressed.

Entropy 2024, 26(3), 233; https://doi.org/10.3390/e26030233

Submission received: 11 January 2024 / Revised: 29 February 2024 / Accepted: 3 March 2024 / Published: 6 March 2024

(This article belongs to the Special Issue Information Theory for Distributed Systems)

Download

Browse Figures

Versions Notes

Abstract

:

We consider privacy mechanisms for releasing data

X = (S, U)

, where S is sensitive and U is non-sensitive. We introduce the robust local differential privacy (RLDP) framework, which provides strong privacy guarantees, while preserving utility. This is achieved by providing robust privacy: our mechanisms do not only provide privacy with respect to a publicly available estimate of the unknown true distribution, but also with respect to similar distributions. Such robustness mitigates the potential privacy leaks that might arise from the difference between the true distribution and the estimated one. At the same time, we mitigate the utility penalties that come with ordinary differential privacy, which involves making worst-case assumptions and dealing with extreme cases. We achieve robustness in privacy by constructing an uncertainty set based on a Rényi divergence. By analyzing the structure of this set and approximating it with a polytope, we can use robust optimization to find mechanisms with high utility. However, this relies on vertex enumeration and becomes computationally inaccessible for large input spaces. Therefore, we also introduce two low-complexity algorithms that build on existing LDP mechanisms. We evaluate the utility and robustness of the mechanisms using numerical experiments and demonstrate that our mechanisms provide robust privacy, while achieving a utility that is close to optimal.

Keywords:

local differential privacy; Rényi divergence; robust optimization

1. Introduction

We consider the setting in which an aggregator collects data from many users with the purpose of, for instance, computing statistics or training a machine learning model. In particular, the data contain sensitive information and users do not trust the aggregator. Therefore, they employ a privacy mechanism that transforms the data before sending it to the aggregator. Users have data

X = (S, U)

from a finite alphabet

X = S \times U

, where

s \in S

is sensitive information and

u \in U

is non-sensitive. Data are distributed i.i.d. across users according to the distribution

P^{*}

. In order to preserve their privacy, users disclose a sanitized version Y of X by using a privacy mechanism

Q : X \to Y

. The aim is that Y contains as much information about X as possible without leaking too much information about S. The challenge that is addressed in this paper is to develop good privacy mechanisms. This scenario and closely related ones were studied in, for instance [1,2,3,4,5,6,7,8,9,10,11]. In this paper, we use the following version of local differential privacy (LDP), as introduced in [3]:

P (Y = y | S = s) \leq e^{ε} P (Y = y | S = s^{'}),

(1)

for all

s, s^{'} \in S

and privacy parameter

ε > 0

. In addition, we measure the utility of Y through the mutual information

I (X; Y)

. We discuss differences with related work in Section 2.

Note that if all information is sensitive, i.e., if

X = S

, (1) reduces to

P (Y = y | X = x) \leq e^{ε} P (Y = y | X = x^{'}),

(2)

which is the traditional LDP constraint [1,2,5]. An important property of (2) is that it does not depend on

P^{*}

, but only on

Q

. The independence of

P^{*}

is a key factor in the success of differential privacy, since it leverages the need to make assumptions about the distribution of the data or on the background/side-knowledge available to the aggregator. As is clear from (1), however, independence from

P^{*}

no longer holds if not all data are sensitive.

Assuming that

P^{*}

is known, one can develop good privacy mechanisms for various settings with partially sensitive information [3,6,12]. In practice, however,

P^{*}

has to be modeled using domain knowledge or estimated from data, leading to errors. The prevalent approach in the literature has been to develop privacy mechanisms based on a (point) estimate

\hat{P}

and analyze sensitivity with respect to. errors in this estimate. In this work, we follow the approach that was proposed in [13,14], which is to construct a set

F

of probability distributions that we are confident contains

P^{*}

. Subsequently, we construct privacy mechanisms that aim to maximize utility, while satisfying (1) for all probability distributions in

F

. We call the resulting privacy framework robust local differential privacy (RLDP).

In a sense, RLDP is a relaxed form of privacy. Indeed, it may seem appealing, but it is—as we illustrate next—often infeasible to enforce (1) for all possible distributions. To this end, we consider two extreme cases. First, consider a joint distribution of S and U under which

S = U

. Intuitively, we cannot disclose much information about U, since this is directly leaking information about S. As such, the utility of Y is low. Next, consider a joint distribution under which S and U are independent. Intuitively, we can disclose U without additional precautions, providing a high utility on Y. The point is that we need to design a single privacy mechanism

Q

that satisfies (1) for all distributions, including the ‘worst case’ in which

S = U

, leading to low utility Y. In this work, we take the mid-ground between, on the one hand, only using a point estimate

\hat{P}

and, on the other hand, using all possible distributions. We do so by defining a set of ‘reasonable’ distributions

F

. In particular, we construct

F

based on public side-information. This public side information consists of n pairs of data

(s_{1}, u_{1}), \dots, (s_{n}, u_{n})

, which like the data of users are i.i.d. according to unknown distribution

P^{*}

. Our set

F

is constructed as a closed ball under a Rényi divergence around the maximum likelihood point estimate

\hat{P}

of

P^{*}

. By doing so, we are (statistically) confident that

F

contains

P^{*}

, with the radius of the ball controlling the confidence level.

The RLDP framework is an instance of the more general Pufferfish framework [15]. In Section 2, we make this connection explicit and use it to describe the semantic privacy guarantees that are offered by RLDP.

The main contributions of this paper are as follows:

We use a Rényi divergence to construct $F$ and analyze the resulting structure and statistics of $F$ . In particular, we demonstrate that projections of $F$ are again balls under the same divergence. Moreover, we bound the projected sets in terms of an $l_{1}$ norm.
Using these results we approximate $F$ by an enveloping polytope. We then use techniques from robust optimization [16,17,18] to characterize PolyOpt, the mechanism that is optimal over this polytope.
A drawback of this method is that it relies on vertex enumeration and is, therefore, computationally unfeasible for large alphabets. Therefore, we introduce two low-complexity privacy mechanisms. The first is independent reporting (IR), in which S and U are reported through separate LDP mechanisms.
We characterize the conditions that underlying LDP mechanisms have to satisfy in order for IR to ensure RLDP. Furthermore, while IR can incorporate any LDP mechanism, we show that it is optimal to use randomized response [19]. This drastically reduces the search space and allows us to find the optimal IR mechanism using low-dimensional optimization.
The second low-complexity mechanism that we develop is called secret-randomized response (SRR) and is based on randomized response.
We show that SRR maximizes mutual information in the low-privacy regime for the case that $F$ is the entire probability simplex.
We demonstrate the improved utility of RLDP over LDP with numerical experiments. In particular, we compare the performance of our mechanisms with generalized random response [5]. We provide results for both synthetic data sets and real-world census data.

The structure of this paper is as follows: After discussing related work in Section 2, we describe the model in detail in Section 3. In Section 4, we present results on the structure and statistics of projections of

F

. These results are used in Section 5 to develop the PolyOpt privacy mechanism. Low-complexity privacy mechanisms are presented in Section 6 and Section 7. In Section 8, we evaluate the discussed methods experimentally. Finally, in Section 9, we provide a discussion of our results and provide an outlook on future work. Most proofs are deferred to Appendix A.

Part of this paper was presented at the IEEE International Symposium on Information Theory 2021 [14]. In this paper, we generalize from a

χ^{2}

-divergence to an arbitrary Rényi divergence. Moreover, Section 4 and Section 6, most of Section 8, and all proofs are new in the current paper.

2. Related Work

2.1. The Pufferfish Framework

Our RLDP framework is an instance of the more general Pufferfish framework [15]. In this subsection, we make this connection explicit and elaborate on the semantic guarantees offered by RLDP.

A privacy definition following the Pufferfish framework specifies (i) a set of potential secrets, (ii) a set of discriminative pairs of secrets, and (iii) a set of assumptions about how data are generated. In RLDP the potential secrets are the possible values of S, i.e.,

S

. We want to prevent the aggregator from learning anything about S. This means that it should not be able to distinguish the case

S = s

from

S = s^{'}

for all

s \neq s^{'}

, so all non-identical pairs are discriminative. Note that this relies on

S

being finite, with extensions to continuous

S

discussed in detail in [15].

The set of assumptions on how data are generated consist, in our setting, of probability distributions over

X

. A key idea in Pufferfish is that this set explicitly models the information that is available to an attacker, i.e., an entity that is trying to infer information about S by observing Y. In our setting, the aggregator is the only attacker and a probability distribution P over

X

captures the beliefs that the attacker has about S prior to seeing Y. We can rewrite (1) as

\frac{P_{X \sim P} (S = s | Y = y)}{P_{X \sim P} (S = s^{'} | Y = y)} \leq e^{ε} \frac{P_{X \sim P} (S = s)}{P_{X \sim P} (S = s^{'})}

(3)

and see that our local differential privacy constraint (1) can be interpreted as the condition that the posterior distribution of S after seeing Y must be very close to the prior distribution. The relevance of P is that it captures a specific set of beliefs of the attacker. As such, we want (3) to hold for various values of P, where each P captures specific background/side-knowledge available to the attacker/aggregator. Note that by doing so we are not making any claims about the actual knowledge available to the aggregator, but instead describing the possible scenarios for which we want to protect the privacy of users. In Pufferfish, these possible scenarios are called the set of assumptions on how data are generated, and in RLDP this is

F

.

Often, side-information in the form of domain knowledge or existing data is publicly available; i.e., to both the users and the aggregator. This public side-information may suggest, for instance, that there is, at most, limited dependence between S and U. In that case, protecting against attackers who have the belief that

S = U

incurs an enormous penalty in achieved utility. It is true that those attackers gain a lot of information on S by observing Y. However, they could have also obtained this information from the public side-information directly. Therefore, the approach taken in the Pufferfish framework and in this paper is that we only protect against attackers that have beliefs, i.e., distributions P, that are in line with publicly available side information.

A challenge in working with the Pufferfish framework is that it is often challenging to find good mechanisms. A general mechanism is proposed in [20], but it relies on enumerating over all distributions in

F

, which is an uncountable set in our setting and cannot be used here. A constrained version of Pufferfish that facilitates analysis and a methodology for finding good mechanisms is proposed in [21]. Another interesting line of work is to model correlations between users in the non-local differential privacy setting [22]. Finally, ref. [23] proposed a modeling framework for capturing domain knowledge about the data. In contrast, in the current work, we impose constraints that are learned from data. Our setting does not fit any of the frameworks for which good mechanisms are known in the literature. One of the main contributions of this paper is to develop such mechanisms.

2.2. Other Privacy Frameworks

Disclosing X through a privacy mechanism that protects sensitive information S has been studied extensively. One line of work starts from differential privacy [24] and imposes the additional challenge that the aggregator cannot be trusted, leading to the concept of local differential privacy [1,2,5]. For this setting, several privacy mechanisms exist, including randomized response [19] and unary encoding [25]. Optimal LDP mechanisms under a variety of utility metrics, including mutual information, are found in [5]. In [1,2,5], all data are sensitive, i.e.,

X = S

. The variation of LDP for the case of disclosing

X = (S, U)

, where only S is sensitive, was proposed in [3] and is the setting that we study in this paper. Another line of work connects this setting to the information bottleneck [26], leading to a privacy constraint in terms of mutual information [6,8,9,10]. In these works, it is shown that approaches to optimizing the information bottleneck also work for finding good privacy mechanisms.

Next to differential privacy and mutual information as privacy measures, a multitude of other privacy frameworks and leakage measures exist [27]. Some of these have been studied in the context of privacy mechanisms. In [7,11], privacy leakage is measured through the improved potential of statistical inference by an attacker after seeing the disclosed information. This measure is formulated through a general cost function, with mutual information resulting as a special case. Perfect privacy, which demands the output to be independent of the sensitive data, was studied in [28], and methods were given to find optimal mechanisms in this setting. An estimation-theoretic framework was studied in [29,30]. Our use of a Rényi divergence in the construction of

F

may suggest considering a generalization of our privacy definition. This could be achieved by considering, for instance, a Rényi divergence in the privacy constraint, as done in [31]. Along a different line, in [32], the maximal leakage measure with a clear operational interpretation is defined. In [33], this measure is generalized to a parametrized measure, enabling interpolating between maximal leakage and mutual information. A stronger, pointwise, version of the maximal leakage measure is proposed in [34]. These are interesting research directions but not pursued in this paper.

Our setting

X = (S, U)

is a special case of a Markov chain

S - X - Y

, where only X is observed. This Markov chain is typically studied in the information bottleneck and privacy funnel settings [6,26]. We do not generalize to this setting, because we need observations of S for the estimate of

P_{U | S}

. Without direct observations of s, we can only make worst-case assumptions on

P_{U | S}

, leading to very poor utility. A different type of model, in which only part of the information in X is sensitive, is proposed in [12]. This is a block-structured model in which X is partitioned and information about the partition of an element is sensitive but its index in the partition is not. Our setting of

X = S \times U

does not fit this model. One can partition

X

according to

U

, but our privacy constraints are different from [12]. We will elaborate on this in Section 6.

2.3. Robustness

The distribution

P_{S, U}^{*}

is not available in practice. The approach taken in most works is to estimate

P_{S, U}^{*}

from data and analyze sensitivity with respect to this estimate

{\hat{P}}_{S, U}

. One of the contributions in [7] is to quantify the impact of mismatched priors, i.e., the impact of not knowing

P_{S, U}^{*}

exactly. A bound on the resulting level of privacy is derived in terms of the total variational distance between the actual and the estimated

{\hat{P}}_{S, U}

. The setting in [35] is similar to ours: A ball of probability distributions, centered around a point estimate, was defined that contains

P_{S, U}^{*}

with high probability. It was then shown that a privacy mechanism that was designed based on the empirical distribution was valid for the entire set for a looser privacy constraint. The privacy slack was quantified and shown to approach zero as the size of the data set increased. An important difference with the current work was that we explicitly optimize the privacy mechanism over the uncertainty set. Another difference is that we base our ball on a Rényi divergence, whereas [35] used an

l_{1}

norm. The main technical tool used in [35] was large deviations theory, whereas we rely on convex analysis and robust optimization. We also mention [36,37]. In [36] it is assumed that nothing is known about

P_{S}^{*}

and

P_{U | S}^{*}

. It is shown that good privacy mechanisms can be found through a connection to maximal correlation, see also [38]. In [37], sets of probability distributions are not derived from data but carefully modeled such that optimal mechanisms can be derived analytically.

Using robust optimization [16] to find a good mechanism that satisfies privacy constraints for all

P_{S, U}

in uncertainty set

F

was proposed in [13,14]. In this work, we generalize and extend results from [14]. The idea of robust optimization is that constraints in an optimization problem contain uncertain parameters that are known to come from a (a priori defined) uncertainty set. The constraints must hold for possible values of the uncertain parameters. A key result is that, using Fenchel duality, the problem can be expressed in terms of the support function of the uncertainty set and the convex conjugate of the constraint [16,17]. The case where the uncertain parameters are probabilities is known as distributionally robust optimization. Using results from [39], it was shown in [40] how an uncertainty set can be constructed from data using an f-divergence, providing an approximate confidence set. Confidence sets for parameters that are not necessarily probabilities were constructed in [18] under a

χ^{2}

-divergence. Convergence of robust optimization based on f-divergences was studied in [41] and for the case of a KL-divergence in [42]. In [43], it is shown how distributionally robust optimization problems over Wasserstein balls can be reformulated as convex problems. For the regular differential privacy setting, distributionally robust optimization was used in [44] to find optimal additive privacy mechanisms for a general perturbation cost function. In this paper, we show how robust optimization can be applied to the setting of partially sensitive information with local differential privacy.

2.4. Miscellaneous

Another line of work on privacy mechanisms builds on recent advances in generative adversarial networks [45]. In [46,47], a generative adversarial framework is used to provide privacy mechanisms that do not use explicit expressions for

P_{X}

. Even though this is not explicitly addressed in [46,47], it is expected that the generalization properties of networks will provide a form of robustness. Closely related approaches are used in the field of face recognition [48,49], with the aim of preventing biometric profiling [50]. The leakage measures that are used in [48,49], however, do not seem to have an operational interpretation.

Disclosing information in a privacy-preserving way is one of the main challenges in official statistics [51,52]. The setting considered in the current paper is closely connected to disclosing a table with microdata, where each record in the table is released independently of the other records. This approach to disclosing microdata was studied in [4] by considering expected error as the utility measure and mutual information as the privacy measure. The resulting optimization problem corresponds to the traditional rate-distortion problem.

3. Model and Preliminaries

In this section, we give an overview of the setting and objectives of this paper. The notation used in this section, as well as the rest of the paper, is summarized in Table 1.

The data space is

X = S \times U

, where

S

and

U

are finite sets. We write

| S | = : a_{1}

,

| U | = : a_{2}

, and

| X | = a_{1} a_{2} = : a

. Data items

X = (S, U)

are drawn from a probability distribution

P^{*}

in

P_{X}

, the space of probability distributions on

X

; here, S represents sensitive data, while U represents non-sensitive data. The aggregator’s aim is to create a privacy mechanism

Q : X \to Y

such that

Y = Q (X)

contains as much information about X as possible, while not leaking too much information about S.

The mechanism

Q

is a probabilistic map, which we represent by a left stochastic matrix

{(Q_{y | x})}_{y \in Y, x \in X}

, and we write

| Y | = b

. Often, we identify

Y = {1, \dots, b}

, and likewise for other sets.

The distribution

P^{*}

is not known exactly. Instead, there is a set of possible distributions

F \subset P_{X}

, where

P_{X}

denotes the probability simplex over

X

. We choose

F

in such a way that it is likely that

P^{*} \in F

. The uncertainty set

F

captures our uncertainty about

P^{*}

, we guarantee privacy for all

P \in F

. We denote this as robust local differential privacy (RLDP).

Definition 1

(Robust Local Differential Privacy). Let

ε \geq 0

and

F \subset P_{X}

. We say that

Q

satisfies

(ε, F)

-RLDP if for all

s, s^{'} \in S

, all

y \in Y

, and all

P \in F

we have

P_{X \sim P} (Y = y | S = s) \leq e^{ε} P_{X \sim P} (Y = y | S = s^{'}) .

(4)

Note that we use the notation

P_{X \sim P} (•)

to emphasize that X is distributed according to P. If no confusion can arise, we often leave out the subscript

X \sim P

, to improve readability. Note that we can also write

P_{X \sim P} (Y = y | S = s) = \sum_{u \in U} Q_{y | s, u} P_{X \sim P} (U = u | S = s),

(5)

so Definition 1 depends on the conditional probabilities of U given

S = s

and

S = s^{'}

. It does not, however, depend on the realization of U.

For clarity and use in future sections, we give the definition of regular LDP [1], which is used when the goal is to obfuscate all of X, rather than just S.

Definition 2

(Local Differential Privacy). Let

ε \geq 0

. We say that

Q : X \to Y

satisfies ε-LDP if for all

x, x^{'} \in X

and all

y \in Y

we have

P (Y = y | X = x) \leq e^{ε} P (Y = y | X = x^{'}) .

(6)

Now, for aggregator uncertainty about

P^{*}

, as captured by

F

, we suppose there is a data base

\vec{x} = (x_{1}, \dots, x_{n})

accessible to the user, where each

x_{I} = (s_{I}, u_{I})

is drawn independently from

P^{*}

. Based on this, the user produces an estimate

\hat{P}

of

P^{*}

. In the experiments, we consider a maximum likelihood estimator, i.e.,

{\hat{P}}_{x} = | {i \leq n : x_{I} = x} |

. We construct the uncertainty set

F

as a closed ball around

\hat{P}

. In particular, let

D_{α}

be the Rényi divergence of order

α

on

P_{X}

, i.e., for

α \in (0, \infty)

D_{α} (\hat{P} | | P) = \{\begin{matrix} \frac{1}{α - 1} log (\sum_{x \in X} \frac{{\hat{P}}_{x}^{α}}{P_{x}^{α - 1}}), & if α \neq 1, \\ \sum_{x \in X} {\hat{P}}_{x} log \frac{{\hat{P}}_{x}}{P_{x}}, & if α = 1 . \end{matrix}

(7)

The case

α = 1

follows, in fact, as a limit from the

α \neq 1

case. Similarly, the definition can be extended to

α \in {0, \infty}

by taking the corresponding limits, but in this paper we restrict our attention to

α \in (0, \infty)

to keep the presentation clear. Note that

D_{1} = D_{KL}

, the Kullback–Leibler divergence, and

D_{2} = log χ^{2}

, where the

χ^{2}

-divergence is

χ^{2} (P_{1} | | P_{2}) = \sum_{x} {(P_{1, x} - P_{2, x})}^{2} P_{2, x}^{- 1}

. In general, a Rényi divergence is a continuous increasing function of a power divergence (a.k.a. Hellinger divergence) [39,53,54], an example of an f-divergence. We omit

α

from the notation when it is clear from the context.

We define

F

by fixing a bound

B \in [0, \infty]

and letting

F = \{P \in P_{X} : D_{α} (\hat{P} | | P) \leq B\} .

(8)

Since a Rényi divergence is a continuous increasing function of an f-divergence, it follows from [39,40] that

F

is a confidence set for

P^{*}

. In particular, for the case of

α = 2

, which will be used in our numerical experiments in Section 8, for suitable B, we have

F = \{P \in P_{X} : \sum_{x} \frac{{({\hat{P}}_{x} - P_{x})}^{2}}{P_{x}} \leq \frac{F_{χ^{2}, a - 1}^{- 1} (1 - β)}{n}\},

(9)

with

β \in (0, 1)

, where

F_{χ^{2}, a - 1}

is the cumulative density function of the

χ^{2}

-distribution with

a - 1

degrees of freedom, resulting in a set

F

with significance level

β

. This means that the probability of

P^{*} \in F

is at least

1 - β

.

Hence, by designing

Q

based on

F

, we are confident in satisfying (1) for all attackers that have beliefs that are based on the public side-information, as well as for attackers that have beliefs that are closer to

P^{*}

.

As a special case of the above, we will study the case that nothing is known about

P^{*}

. In this case,

B \to \infty

and

F = P_{X}

. Regarding privacy, this is the ‘safest’ choice, as we do not make assumptions about

P^{*}

. Another special case is where

F

is a singleton, which reflects a situation where

B = 0

and

P^{*}

is assumed to be known. This setting was studied in [3].

Given

F

and

ε

, the goal is now to create a

Q : X \to Y

to be used on new/future data; our setting is depicted in Figure 1. The aim of this paper is to find a satisfactory answer to the following problem:

Problem 1.

Given

F

and ε, find a

Q

satisfying

(ε, F)

-RLDP, while maximizing a given utility function.

Throughout this paper, we follow the original privacy funnel [6] and its LDP counterpart [3] in taking mutual information

I (X; Y)

as a utility measure. As is argued in [6], mutual information arises naturally when minimizing log loss distortion in the privacy funnel scenario. As a utility measure of

Q

, we take

I_{X \sim P} (X; Y)

(abbreviated to

I_{P} (X; Y)

), since the aim is to create Y that reflects X as faithfully as possible. This utility measure depends on the distribution P of X that we choose to evaluate. Ideally, one would like to use

P = P^{*}

, but in practice this is not possible, as

P^{*}

is unknown. In the theoretical part of this paper, we circumvent this issue by proving our results for general P. In the experiments of Section 8, we take

P = \hat{P}

as the best available alternative to

P = P^{*}

. We investigate the effect of this choice by comparing

I_{P^{*}} (X; Y)

to

I_{\hat{P}} (X; Y)

.

Another option is to use the robust utility measure

{min}_{P \in F} I_{P} (X; Y)

to ensure good utility for every ‘reasonable’ P, see [13]. We do not explicitly study this measure in this paper, but since our results hold for general P, they can also be applied to robust utility.

Example 1.

We set up an example to illustrate the concepts of this paper. Take

S = {s_{1}, s_{2}}

and

U = {u_{1}, u_{2}}

, and suppose

P^{*} = (\begin{matrix} P_{s_{1}, u_{1}}^{*} \\ P_{s_{1}, u_{2}}^{*} \\ P_{s_{2}, u_{1}}^{*} \\ P_{s_{2}, u_{2}}^{*} \end{matrix}) = (\begin{matrix} 0.1 \\ 0.1 \\ 0.2 \\ 0.6 \end{matrix}) .

(10)

Moreover, suppose we have a publicly known database of

n = 100

entries, from which we estimate

\hat{P} = (\begin{matrix} {\hat{P}}_{s_{1}, u_{1}} \\ {\hat{P}}_{s_{1}, u_{2}} \\ {\hat{P}}_{s_{2}, u_{1}} \\ {\hat{P}}_{s_{2}, u_{2}} \end{matrix}) = (\begin{matrix} 0.07 \\ 0.10 \\ 0.26 \\ 0.57 \end{matrix}) .

(11)

To obtain a

95 %

-confidence set for

F

according to a

χ^{2}

-distribution, we take

α = 2

and

B = log (1 + \frac{F_{χ^{2}, 3}^{- 1} (0.05)}{100}) = 0.0752

. In this way, we obtain

\begin{matrix} (12) & F & = \{P \in P_{X} : D_{α} (\hat{P} | | P) \leq B\} \\ (13) & = \{P \in P_{X} : log (\sum_{x} \frac{{\hat{P}}_{x}^{2}}{P_{x}}) \leq log (1 + \frac{F_{χ^{2}, 3}^{- 1} (0.05)}{100})\} \\ (14) & = \{P \in P_{X} : \sum_{x} \frac{{({\hat{P}}_{x} - P_{x})}^{2}}{P_{x}} \leq \frac{F_{χ^{2}, 3}^{- 1} (0.95)}{100}\}, \end{matrix}

which is the desired confidence set (note that the

χ^{2}

-distribution has

| X | - 1 = 3

degrees of freedom). In this case, we have

D_{2} (\hat{P} | | P^{*}) = 0.0281 < B

, so

P^{*} \in F

.

4. Conditional Projection of $F$

In Section 5 and Section 7 below, we will introduce privacy mechanisms that provide

(ε, F)

-RLDP. These mechanisms depend on the conditional projections of

F

on

P_{U}

given

S = s

, denoted as

F_{U | s}

. In this section, we analyze the structure and statistics of these sets. To do so, we introduce, for

s \in S

,

u \in U

and

P \in P_{X}

.

\begin{matrix} (15) & P_{s} & = \sum_{u \in U} P_{u, s}, \\ (16) & P_{u | s} & = \frac{P_{u, s}}{P_{s}}, \\ (17) & P_{U | s} & = {(P_{u | s})}_{u \in U} \in P_{U}, \\ (18) & F_{U | s} & = {P_{U | s} : P \in F} \subset P_{U}, \end{matrix}

We are interested in the following statistics:

\begin{matrix} (19) & L_{u | s} (F) & = min_{R \in F_{U | s}} R_{u} for a given u \in U, \\ (20) & {rad}_{s} (F) & = max_{R \in F_{U | s}} | | R - {\hat{P}}_{U | s} {| |}_{1} . \end{matrix}

In (19),

R_{u}

is the u-coefficient of

R \in P_{U}

. It turns out that these statistics give us the information required to construct

(ε, F)

-protocols efficiently: In Section 5, we use

L_{u | s} (F)

to approximate

F_{U | s}

by a polytope, to make computation easier, while in Section 7, we use

{rad}_{s} (F)

as a measure for the size of

F_{U | s}

. While these statistics (or bounds for them) are relatively easy to find for

F

itself, the hard part lies in the fact that we have to give bounds for the projection

F_{U | s}

. The extent to which these bounds can be found explicitly heavily depends on the divergence measure that is used to construct

F

. In this section, we show how these bounds can be obtained for our case where we construct

F

using a Rényi divergence. The reason for this, as we will see below, is that we can give an explicit description of

F_{U | s}

.

4.1. Structure of $F_{U | s}$

Recall that, for a given

α \in (0, \infty)

, the Rényi divergence

D_{α} : P_{X} \to [0, \infty)

is defined by

D_{α} (\hat{P} | | P) = \{\begin{matrix} \frac{1}{α - 1} log (\sum_{x \in X} \frac{{\hat{P}}_{x}^{α}}{P_{x}^{α - 1}}), & if α \neq 1, \\ \sum_{x \in X} {\hat{P}}_{x} log \frac{{\hat{P}}_{x}}{P_{x}}, & if α = 1 . \end{matrix}

(21)

The following theorem states that the conditional projections of balls defined by Rényi divergence are themselves Rényi divergence balls:

Theorem 1.

Let

s \in S

be such that

{\hat{P}}_{s} > 0

. Let

F

be defined by Rényi divergence, i.e.,

F = \{P \in P_{X} : D_{α} (\hat{P} | | P) \leq B\}

(22)

for a given

α \in (0, \infty)

and

B \in R_{\geq 0}

. Define the constant

B_{s}

by

B_{s} = \{\begin{matrix} \frac{α}{α - 1} log (\frac{e^{(α - 1) B / α} - (1 - {\hat{P}}_{s})}{{\hat{P}}_{s}}), & if α \neq 1, \\ \frac{B}{{\hat{P}}_{s}}, & if α = 1 . \end{matrix}

(23)

Then,

F_{U | s} = \{R \in P_{U} : D_{α} ({\hat{P}}_{U | s} | | R) \leq B_{s}\} .

(24)

This theorem gives us a direct description of the

F_{U | s}

, which is useful because the

L_{u | s} (F)

of (19) and

{rad}_{s} (F)

of (20) are defined in terms of these projection sets. A similar bound could also be found for the limit cases

α = 0, \infty

, but this is not pursued in this paper, because it does not provide additional insights.

A key property of the Rényi divergence that allows us to prove Theorem 1 is that we can write

\frac{{\hat{P}}_{x}^{α}}{P_{x}^{α - 1}} = \frac{{\hat{P}}_{u | s}^{α}}{P_{u | s}^{α - 1}} \cdot \frac{{\hat{P}}_{s}^{α}}{P_{s}^{α - 1}} .

(25)

This allows us to express the divergence

D_{α} ({\hat{P}}_{U | s} | | P_{U | s})

in terms of

D_{α} (\hat{P} | | P)

. For other divergences, which may depend on

\hat{P}

and P in a more complicated way, this is typically not possible. Therefore, we cannot generalize our results to uncertainty sets constructed from, for instance, arbitrary f-divergences.

In light of this theorem and the fact that in the following sections we care more about the statistics of

F_{U | s}

than about those of

F

itself, one might be inclined to think that it is more straightforward to estimate the

{\hat{P}}_{U | s}

from the data and defining uncertainty sets

F_{U | s}

around them directly, without going through the intermediate stage

F

. However, projecting these sets back to

P_{X}

results in a larger set. In other words, there are distributions P such that each

P_{U | s}

is an element of

F_{U | s}

, while

P \notin F

. That is, we have

F ⊊ F^{'} : = {P \in P_{X} : \forall s P_{U | s} \in F_{U | s}}

. The reason for this is that, in the proof of Theorem 1, it becomes clear that the

P \in F

that project to the boundary points of

F_{U | s}

satisfy

P_{U | s^{'}} = {\hat{P}}_{U | s^{'}}

for

s^{'} \neq s

. In other words, elements of

F

can be extremal in, at most, one

F_{U | s}

. By contrast,

F^{'}

also includes P that are extremal in multiple

F_{U | s}

. We conclude that constructing the

F_{U | s}

directly results in a larger

F^{'}

, which results in a lower utility. We will give an example of this phenomenon in Example 2.

4.2. Statistics of $F_{U | s}$

In this section, we analyze statistics of

F_{U | s}

. More concretely, to find

L_{u | s} (F)

and

{rad}_{s} (F)

, fix s,

α

and B and define for

ρ \in [0, 1]

and

ξ \in R_{\geq 0}

such that

ξ (1 - ρ) \leq 1

,

\begin{matrix} (26) & φ_{B_{s}} (ρ, ξ) & = \{\begin{matrix} \frac{1}{α - 1} log (ρ ξ^{1 - α} + (1 - ρ) {(\frac{1 - ρ ξ}{1 - ρ})}^{1 - α}) - B_{s}, & if α \neq 1 and ρ \neq 1, \\ ρ log \frac{1}{ξ} + (1 - ρ) log \frac{1 - ρ}{1 - ρ ξ} - B_{s}, & if α = 1 and ρ \neq 1, \\ log \frac{1}{ξ} - B_{s}, & if ρ = 1, \end{matrix} \\ (27) & ξ_{-} (ρ) & = inf \{ξ \in (0, 1] : φ_{B_{s}} (ρ, ξ) \leq 0\}, \\ (28) & ξ_{+} (ρ) & = sup \{ξ \in [1, {(1 - ρ)}^{- 1}) : φ_{B_{s}} (ρ, ξ) \leq 0\} . \end{matrix}

Note that the case

ρ = 1

can be obtained via taking the limit. The expressions for

ξ_{-}

and

ξ_{+}

are a bit complicated, but note that, given

ρ < 1

, the function

φ_{B_{s}} (ρ, ξ)

is convex in

ξ

. Thus,

φ_{B_{s}} (ρ, ξ) = 0

has at most two solutions. Furthermore,

φ_{B_{s}} (ρ, 1) = - B_{s}

and

φ_{B_{s}} (ρ, ξ) \to \infty

as

ξ

approaches 0 or

\frac{1}{1 - ρ}

, so for

ρ < 1

the values

ξ_{-} (ρ)

and

ξ_{+} (ρ)

are the two solutions to

φ_{B_{s}} (ρ, ξ) = 0

.

The following proposition expresses our desired statistics in terms of

ξ_{-}

and

ξ_{+}

.

Proposition 1.

Let

u \in U

. Then,

\begin{matrix} (29) & L_{u | s} (F) & = {\hat{P}}_{u | s} ξ_{-} ({\hat{P}}_{u | s}), \\ (30) & {rad}_{s} (F) & = 2 max_{\begin{matrix} U_{1} \subset U : \\ U_{1} \neq ⌀ \end{matrix}} {\hat{P}}_{U_{1} | s} (ξ_{+} ({\hat{P}}_{U_{1} | s}) - 1) . \end{matrix}

As discussed above,

ξ_{\pm} (ρ)

can be found quickly numerically; however, the calculation of

{rad}_{s} (F)

still involves taking the maximum over an exponentially large set.

4.3. Special Case $α = 2$

In this section, we show that when

α = 2

, we can find explicit expressions for

ξ_{\pm}

and consequently

L_{u | s}

and

{rad}_{s}

. As discussed in (9), for this

α

, the set

F

is a confidence set for a

χ^{2}

-test. To find

ξ_{-} (ϱ), ξ_{+} (ϱ)

, we need to solve

φ_{B_{s}} (ρ, ξ) = 0

. For

α = 2

, we can write this as a quadratic equation in

ξ

, and solving it leads to the following expression:

Lemma 1.

Suppose

α = 2

. Then,

\begin{matrix} ξ_{-} (ρ) & = \frac{e^{B_{s}} + 2 ρ - 1 - \sqrt{(e^{B_{s}} - 1) (e^{B_{s}} - {(2 ρ - 1)}^{2})}}{2 e^{B_{s}} ρ}, \end{matrix}

(31)

\begin{matrix} ξ_{+} (ρ) & = \frac{e^{B_{s}} + 2 ρ - 1 + \sqrt{(e^{B_{s}} - 1) (e^{B_{s}} - {(2 ρ - 1)}^{2})}}{2 e^{B_{s}} ρ} . \end{matrix}

(32)

Now, we can determine

L_{u | s} (F)

and

{rad}_{s} (F)

using Lemma 1 and Proposition 1. For

L_{u | s} (F)

, we immediately obtain an expression; for

{rad}_{s} (F)

, a careful analysis of

ξ_{+}

shows that the optimal

U_{1}

of (30) can be found. For large enough

B_{s}

, the optimum is at

U_{1} = {u_{min}}

, where

u_{min}

is the u that minimizes

{\hat{P}}_{u | s}

. Thus, we obtain a concrete expression for

{rad}_{s} (F)

without the need for optimization. For smaller

B_{s}

, we do not find an exact expression, but we can still derive a lower bound. The results are summarized in the following proposition.

Proposition 2.

Let

α = 2

. Then, the following hold:

1.: One has

$L_{u | s} (F) = \frac{e^{B_{s}} + 2 {\hat{P}}_{u | s} - 1 - \sqrt{(e^{B_{s}} - 1) (e^{B_{s}} - {(2 {\hat{P}}_{u | s} - 1)}^{2})}}{2 e^{B_{s}}} .$

(33)
2.: Let $u_{min} = arg {min}_{u \in U} {\hat{P}}_{u | s}$ . If $B_{s} \geq log (1 + {(1 - {\hat{P}}_{u_{min} | s})}^{2})$ , then

${rad}_{s} (F) = \frac{- e^{B_{s}} + 2 {\hat{P}}_{u_{min} | s} - 1 + \sqrt{(e^{B_{s}} - 1) (e^{B_{s}} - {(2 {\hat{P}}_{u_{min} | s} - 1)}^{2})}}{e^{B_{s}}} .$

(34)
3.: If $B_{s} < log (1 + {(1 - {\hat{P}}_{u_{min} | s})}^{2})$ , one has ${rad}_{s} (F) \leq \sqrt{e^{B_{s}} - 1}$ .

We note that

α = 2

is not the only value of

α

for which one can bound

L_{u | s}

and

{rad}_{s}

. For instance, for

α \leq 1

, one can use Pinsker’s inequality [55,56] and its generalizations [57] to bound

{rad}_{s} (F)

in terms of

| | {\hat{P}}_{U | s} - P_{U | s} {| |}_{1}

, which in turn can be used to bound

L_{u | s} (F)

. However, unlike

α = 2

, these do not result in exact bounds.

Example 2.

We continue Example 1. We have

\begin{matrix} {\hat{P}}_{s_{1}} & = 0.17, & {\hat{P}}_{u_{1} | s_{1}} & = 0.4118, & {\hat{P}}_{u_{2} | s_{1}} & = 0.5882, \\ {\hat{P}}_{s_{2}} & = 0.83, & {\hat{P}}_{u_{1} | s_{2}} & = 0.3133, & {\hat{P}}_{u_{2} | s_{2}} & = 0.6867 . \end{matrix}

Inserting our values of B and

{\hat{P}}_{s}

into Theorem 2, we find

B_{s_{1}} = 0.3782

,

B_{s_{2}} = 0.0900

. In other words,

\begin{matrix} P_{U | s_{1}} & = \{R = (\binom{R_{u_{1}}}{R_{u_{2}}}) \in P_{U} : D_{2} ((\binom{0.4118}{0.5882}) | | (\binom{R_{u_{1}}}{R_{u_{2}}})) \leq 0.3782\}, \end{matrix}

(35)

\begin{matrix} P_{U | s_{2}} & = \{R = (\binom{R_{u_{1}}}{R_{u_{2}}}) \in P_{U} : D_{2} ((\binom{0.3133}{0.6867}) | | (\binom{R_{u_{1}}}{R_{u_{2}}})) \leq 0.0900\} . \end{matrix}

(36)

To determine the lower bounds on each

R_{u_{i}}

, we use Proposition 2 to obtain

\begin{matrix} L_{u_{1} | s_{1}} (F) & = 0.1620, & L_{u_{2} | s_{1}} (F) & = 0.2829, \\ L_{u_{1} | s_{2}} (F) & = 0.1923, & L_{u_{2} | s_{2}} (F) & = 0.5337 . \end{matrix}

In principle, we can also use Proposition 2 to determine the

{rad}_{s} (F)

. However, in this case, there is a more straightforward approach. Since

| U | = 2

, every element of

F_{U | s}

is a vector of length two whose coefficients sum to 1; thus

P_{U | s}

is determined by

P_{u_{1} | s}

. Since

L_{u_{1} | s} (F) \leq P_{u_{1} | s} \leq 1 - L_{u_{2} | s} (F)

, it follows that

\begin{matrix} F_{U | s_{1}} & ≅ [L_{u_{1} | s_{1}} (F), 1 - L_{u_{2} | s_{1}} (F)] = [0.1620, 0.7171], \\ F_{U | s_{2}} & ≅ [L_{u_{1} | s_{2}} (F), 1 - L_{u_{2} | s_{2}} (F)] = [0.1923, 0.4663] . \end{matrix}

Under this identification,

{rad}_{s} (F)

is only twice the maximal distance from

{\hat{P}}_{u_{1} | s}

to the endpoint of this interval (the factor two comes from the fact that

| | P_{U | s} - {\hat{P}}_{U | s} {| |}_{1} = | P_{u_{1} | s} - {\hat{P}}_{u_{1} | s} | + | P_{u_{2} | s} - {\hat{P}}_{u_{2} | s} | = 2 | P_{u_{1} | s} - {\hat{P}}_{u_{1} | s} |

). Hence,

\begin{matrix} {rad}_{s_{1}} (F) & = 2 max {0.4118 - 0.1620, 0.7171 - 0.4118} = 0.6107, \\ {rad}_{s_{1}} (F) & = 2 max {0.3133 - 0.1923, 0.4663 - 0.3133} = 0.3061 . \end{matrix}

We can also construct the set

F^{'} = {P \in P_{X} : \forall s P_{U | s} \in F_{U | s}}

of Section 4.1. We can write this as

F^{'} = \{(\begin{matrix} P_{s_{1}, u_{1}} \\ P_{s_{1}, u_{2}} \\ P_{s_{2}, u_{1}} \\ P_{s_{2}, u_{2}} \end{matrix}) \in P_{X} : \begin{matrix} 0.1620 \leq P_{u_{1} | s_{1}} \leq 0.7171, \\ 0.1923 \leq P_{u_{1} | s_{2}} \leq 0.4663 \end{matrix}\} .

(37)

The inequality

0.1620 \leq P_{u_{1} | s_{1}}

can be written as

0.1620 \leq \frac{P_{s_{1}, u_{1}}}{P_{s_{1}, u_{1}} + P_{s_{1}, u_{2}}}

, or

0.1620 P_{s_{1}, u_{2}} \leq 0.83830 P_{s_{1}, u_{1}}

; in other words, this becomes a linear constraint. We can do the same for the other constraints and these, together with inequality constraints of the form

P_{s, u} \geq 0

and the equality constraint

\sum_{s, u} P_{s, u} = 1

, define the polytope

F^{'} \subset R^{4}

. One can calculate that this polytope is a simplex, spanned by the vertices

(\begin{matrix} 0.7171 \\ 0.2829 \\ 0 \\ 0 \end{matrix}), (\begin{matrix} 0.1620 \\ 0.8380 \\ 0 \\ 0 \end{matrix}), (\begin{matrix} 0 \\ 0 \\ 0.4663 \\ 0.5337 \end{matrix}), (\begin{matrix} 0 \\ 0 \\ 0.1923 \\ 0.8077 \end{matrix}) .

(38)

The resulting

F^{'}

is considerably larger than

F

: one way to see this is that, for any of these vertices P, one has

D_{2} (\hat{P} | | P) = \infty

. This example shows the importance of working with the set

F

, rather than with just its projections

F_{U | s}

.

5. Polyhedral Approximation: PolyOpt

In this section, we introduce PolyOpt, a family of mechanisms

Q

with good utility obtained by enclosing

F

by a polyhedron, and then using robust optimization for polyhedra [16] to describe the space of possible

Q

as a polyhedron; we then maximize the mutual information over this polyhedron. This approach is related to the polyhedral approach of [3], which finds the optimum for this problem in a non-robust setting.

For a mechanism

Q

and

y \in Y

, we define

Q_{y} = {(Q_{y | x})}_{x \in X} \in R^{X}

to be the y-th row of the stochastic matrix Q corresponding to

Q

, but transposed (i.e., viewed as a column vector). Likewise, we define the column vector

Q_{y | s} = {(Q_{y | s, u})}_{u \in U} \in R^{U}

. In this notation, the condition for

(ε, F)

-RLDP can be formulated as

\forall y \in Y \forall s_{1}, s_{2} \in S : max_{P \in F} [P_{U | s_{1}}^{T} Q_{y | s_{1}} - e^{ε} P_{U | s_{2}}^{T} Q_{y | s_{2}}] \leq 0 .

(39)

Equation (39) boils down to a set of linear constraints in

Q_{y}

. What makes these difficult to satisfy is that every value

P \in F

provides a linear constraint, and each

Q_{y}

has to satisfy all infinitely many of these. In this section, we address this difficulty by making the set

F

slightly larger, so that robust optimization [16] becomes a convenient tool for optimizing over the allowed

Q

. More precisely, for every

s \in S

, let

D_{s} \subset P_{U}

be such that

F_{U | s} \subset D_{s}

. Then, certainly

max_{P \in F} [P_{U | s_{1}}^{T} Q_{y | s_{1}} - e^{ε} P_{U | s_{2}}^{T} Q_{y | s_{2}}] \leq max_{\begin{matrix} R_{1} \in D_{s_{1}}, \\ R_{2} \in D_{s_{2}} \end{matrix}} [R_{1}^{T} Q_{y | s_{1}} - e^{ε} R_{2}^{T} Q_{y | s_{2}}] .

(40)

Thus, we can conclude that

Q

is

(ε, F)

-RLDP whenever

\forall y \in Y \forall s_{1}, s_{2} \in S : max_{\begin{matrix} R_{1} \in D_{s_{1}}, \\ R_{2} \in D_{s_{2}} \end{matrix}} [R_{1}^{T} Q_{y | s_{1}} - e^{ε} R_{2}^{T} Q_{y | s_{2}}] \leq 0 .

(41)

The trick is now to choose the

D_{s}

in such a way that the set of

Q

satisfying (41) has a closed-form description. To this end, we let each

D_{s}

be a polyhedron; that way, we can use robust optimization for polyhedra [16] to give such a description.

There are multiple ways to create a polyhedron

D_{s}

that envelops

F_{U | s}

. Writing

L_{u | s} = L_{u | s} (F)

for convenience, we take

D_{s} = {R \in P_{U} : \forall u R_{u} \geq L_{u | s}} .

(42)

Since

D_{s}

is described by linear equations, it is a polyhedron, and certainly

F_{U | s} \subset D_{U | s}

for all s. Robust optimization for polytopes [16] then allows us to describe the set of mechanisms satisfying (41). To formulate this, we first need the following definition:

Definition 3.

Let

ε > 0

. Then, define

Γ_{ε}

to be the convex cone consisting of all

v \in R_{\geq 0}^{X}

that satisfy, for all

s_{1}, s_{2} \in S

and all

u_{1}, u_{2} \in U

:

v_{s_{1}, u_{1}} - e^{ε} v_{s_{2}, u_{2}} + \sum_{u} L_{u | s_{1}} (v_{s_{1}, u} - v_{s_{1}, u_{1}}) - e^{ε} \sum_{u} L_{u | s_{2}} (v_{s_{2}, u} - v_{s_{2}, u_{2}}) \leq 0 .

(43)

Note that, for every choice of

s_{1}, s_{2}, u_{1}, u_{2}

, (3) is a linear inequality in T and thus defines a half-space in

R^{X}

. The intersection of these half-spaces, intersected with

R_{\geq 0}^{X}

, defines the convex cone

Γ_{ε}

. This definition allows us to formulate the following result:

Theorem 2.

Let

Q

be a privacy mechanism, and for

y \in Y

, let

Q_{y}

be the y-th row of the associated matrix

Q = {(Q_{y | x})}_{y \in Y, x \in X}

. Suppose that for all y we have

Q_{y} \in Γ_{L}

. Then,

Q

satisfies

(ε, F)

-RLDP.

The upshot of this theorem is that we have translated the infinitely many constraints of (39) and (41) into the finitely many linear constraints of (3). This makes optimizing utility considerably easier. We perform this optimization by translating it into a linear programming problem. The key inspiration for this optimization is Theorem 4 of [5], where optimal LDP mechanisms are found by translating the problem of optimizing mutual information into linear programming; we use an analogous approach adapted to RLDP. This approach can be sketched as follows: Let

\hat{Γ} = {v \in Γ_{ε} : \sum_{x} v_{x} = 1}

, i.e., the intersection of

Γ_{ε}

with the hyperplane corresponding to

\sum_{x} v_{x} = 1

. This is a polyhedron, and every

Q

satisfying the conditions of Theorem 2 has

Q_{y} = θ_{y} v_{y}

, for some

θ_{y} \in R_{\geq 0}

and

v_{y} \in \hat{Γ}

. The authors of [5] made a number of key observations that also apply to our situation. The first is that, in this case, we can write

I_{\hat{P}} (X; Y) = \sum_{y} θ_{y} μ (v_{y}),

(44)

where

μ (v) = \sum_{x \in X} v_{x} {\hat{P}}_{x} log \frac{v_{x}}{\sum_{x^{'}} v_{x^{'}} {\hat{P}}_{x^{'}}} .

(45)

The second observation is that, in order to maximize (44), one can prove from the convexity of

μ

that it is optimal to have each

v_{y}

be a vertex of

\hat{Γ}

. Thus, once we know the set of vertices

V

of

\hat{Γ}

, we find the optimal

Q

by assigning a weight

θ_{v}

to each

v \in V

, in such a way that the resulting

Q_{y}

form a probabilistic matrix and such that (44) is maximized. Since (44) is linear in

θ

, this is a linear programming problem. This discussion is summarized in the following theorem:

Theorem 3.

Let

\hat{Γ}

be a polyhedron given by

{v \in Γ_{L, ε} : \sum_{x} v_{x} = 1}

. Let

V

be the set of vertices of

\hat{Γ}

. Define μ as in (45). Let

1_{X} \in R^{X}

be the constant vector of ones. Let

\hat{θ} \in R_{\geq 0}^{V}

be the solution to the optimization problem

\begin{matrix} {maximise}_{θ} & \sum_{v \in V} θ_{v} μ (v) \\ satisfying & θ \in R_{\geq 0}^{V}, \\ \sum_{v \in V} θ_{v} v = 1_{X} . \end{matrix}

(46)

Let the privacy mechanism

Q

be given by

Y = {v \in V : {\hat{θ}}_{v} > 0}

and

Q_{v | x} = {\hat{θ}}_{v} v_{x}

. Then, the mechanism

Q

maximizes

I_{\hat{P}} (X; Y)

among all mechanisms satisfying the condition of Theorem 2. One has

|Y| \leq a

.

Together, Theorems 2 and 3 show that if we can solve a vertex enumeration problem, we can find a mechanism

Q

that maximizes

I_{\hat{P}} (X; Y)

among a subset of all

(ε, F)

-RLDP mechanisms; furthermore, we ensure that the output space

Y

is, at most, the size of the input space

X

. The proof of Theorem 3 is analogous to the proof of Theorem 4 of [5] and is given in Appendix A.5. Note that the results of [5] do not run into the vertex enumeration problem, because the relevant polyhedron there is

{[1, e^{ε}]}^{X}

, for which the vertices are known.

We remark that a simplex is not the only possible choice for

D_{s}

. In general, we can make

D_{s}

closer to

F_{U | s}

by adding more defining hyperplanes. Doing this allows more

Q

to satisfy Theorem 2 and in turn increases the utility of the

Q

we find via Theorem 3. However, since

Γ

is related to the

D_{s}

via duality, adding extra constraints to the

D_{s}

will increase the dimension of

Γ

through the addition of auxiliary variables. This makes the vertex enumeration problem of Theorem 3 more computationally involved. Thus, we have a trade-off between utility and computational complexity. Even with the given, ‘simple’ choice of

D_{s}

, the computational complexity is quite high: recall that we defined

a = | X |

. The polytope

\hat{Γ}

is

(a - 1)

-dimensional and is defined by

a^{2} + a

inequalities, thus it has

O ({(a^{2} + a)}^{\frac{a - 1}{2}}) = O (a^{a})

vertices [58]. Since this is the dimension of the linear programming problem, we find that the total complexity of finding

Q

is

O (a^{ω a} log (a^{a + 1} / δ))

, where

ω \approx 2.38

is the exponent of matrix multiplication and

δ

the relative accuracy [58]. Clearly, this becomes infeasible rather quickly for large a.

It should be noted that, in general, the increasing utility obtained by decreasing

D_{s}

in size does not approach the optimal utility over all

(ε, F)

-RLDP mechanisms. This is because, as we take increasingly finer

D_{U | s}

, we approach the set of

Q

that satisfy (4) for all P in

F^{'} : = {P : \forall s P_{U | s} \in F_{U | s}}

. As discussed in Section 4.1, one has

F ⊊ F^{'}

. As a result, the set of

(ε, F^{'})

-RLDP mechanisms is strictly smaller than the set of

(ε, F)

-RLDP mechanisms.

Example 3.

We continue Example 2 by taking

ε = log 2

. To obtain

\hat{Γ}

in Theorem 3, we need to combine the defining inequalities of

Γ_{ε}

in Definition 3, along with the defining equality

\sum_{x} P_{x} = 1

. Regarding the inequalities, we have

2^{4} = 16

inequalities of the form (3), as well as 4 inequalities of the form

v_{x} \geq 0

. Together with the equality constraint, we obtain a 3-dimensional polytope in

R^{X} = R^{4}

. Using a vertex enumeration algorithm, one finds that

V

consists of the rows of the matrix V below, where the order of the columns is the order of the rows of Example 1. For each row v, we can calculate

μ (v)

, resulting in the vector μ below. Solving (46), we obtain the vector

\hat{θ}

below:

V = (\begin{matrix} 0.0744 & 0.3227 & 0.5603 & 0.0426 \\ 0.2426 & 0.2426 & 0.4783 & 0.0364 \\ 0.3333 & 0.3333 & 0.1667 & 0.1667 \\ 0.1091 & 0.4737 & 0.2086 & 0.2086 \\ 0.0993 & 0.4310 & 0 & 0.4697 \\ 0.1121 & 0.4864 & 0 & 0.4015 \\ 0.3404 & 0.3404 & 0 & 0.3191 \\ 0.0770 & 0.3343 & 0.2944 & 0.2944 \\ 0.2234 & 0.2234 & 0 & 0.5531 \\ 0.4875 & 0.1434 & 0 & 0.3690 \\ 0.4360 & 0.1283 & 0 & 0.4358 \\ 0.4758 & 0.1400 & 0.1921 & 0.1921 \\ 0.3437 & 0.1011 & 0.2776 & 0.2776 \\ 0.1602 & 0.1602 & 0.6316 & 0.0481 \\ 0.1667 & 0.1667 & 0.3333 & 0.3333 \\ 0.3325 & 0.0978 & 0.5294 & 0.0403 \end{matrix}), μ = (\begin{matrix} 0.1152 \\ 0.0942 \\ 0.0087 \\ 0.0135 \\ 0.1097 \\ 0.0968 \\ 0.0723 \\ 0.0080 \\ 0.1240 \\ 0.0878 \\ 0.1014 \\ 0.0106 \\ 0.0076 \\ 0.1240 \\ 0.0075 \\ 0.1083 \end{matrix}), \hat{θ} = (\begin{matrix} 1.1899 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0.7670 \\ 0 \\ 0 \\ 0 \\ 0 \\ 1.4134 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0.6297 \end{matrix}) .

(47)

We now obtain the privacy mechanism

Q_{PolyOpt}

as follows: each row of

Q_{PolyOpt}

corresponds to a non-zero coefficient of

\hat{θ}

, multiplied by its corresponding row of V. Thus, we obtain

\begin{matrix} Q_{PolyOpt} & = (\begin{matrix} Q_{y_{1} | s_{1}, u_{1}} & Q_{y_{1} | s_{1}, u_{2}} & Q_{y_{1} | s_{2}, u_{1}} & Q_{y_{1} | s_{2}, u_{2}} \\ Q_{y_{2} | s_{1}, u_{1}} & Q_{y_{2} | s_{1}, u_{2}} & Q_{y_{2} | s_{2}, u_{1}} & Q_{y_{2} | s_{2}, u_{2}} \\ Q_{y_{3} | s_{1}, u_{1}} & Q_{y_{3} | s_{1}, u_{2}} & Q_{y_{3} | s_{2}, u_{1}} & Q_{y_{3} | s_{2}, u_{2}} \\ Q_{y_{4} | s_{1}, u_{1}} & Q_{y_{4} | s_{1}, u_{2}} & Q_{y_{4} | s_{2}, u_{1}} & Q_{y_{4} | s_{2}, u_{2}} \end{matrix}) \end{matrix}

(48)

\begin{matrix} = (\begin{matrix} 0.0885 & 0.3840 & 0.6667 & 0.0507 \\ 0.0860 & 0.3731 & 0 & 0.3080 \\ 0.6162 & 0.1813 & 0 & 0.6159 \\ 0.2094 & 0.0616 & 0.3333 & 0.0254 \end{matrix}) . \end{matrix}

(49)

Note that indeed we have

4 = b \leq a = 4

. As for the utility, we have

I_{\hat{P}} (X; Y) = μ \cdot \hat{θ} = 0.4228

. However, the true utility is significantly lower, namely

I_{P^{*}} (X; Y) = 0.2804

.

6. An Optimal Policy for $F = P_{X}$

As PolyOpt mechanisms are obtained via vertex enumeration in a-dimensional space, this can be computationally infeasible for larger a. Thus, there is a need for methods that, given

\hat{P}

and

F

, can find

(ε, F)

-RLDP mechanisms with reasonable computational complexity.

In this section, we consider the case where

F

is maximal, i.e.,

F = P_{X}

. By itself, this represents a situation where we want privacy for every possible probability distribution on

X

. This scenario may not be very relevant in practice, but any protocol that we in find this way is also

(ε, F)

-RLDP for any

F

. As we will see below, this allows us to find

(ε, F)

-RLDP protocols in a computationally efficient manner.

We show that

(ε, P_{X})

-RLDP is almost equivalent to LDP. We exploit this to create SRR, the RLDP analogue to GRR [5], the LDP mechanism that is optimal for

ε ≫ 0

. SRR only depends on

ε

and

X

and not on

\hat{P}

, and as such does not require an optimization procedure to be found; this makes it a good choice when vertex enumeration is computationally infeasible. The downside is that SRR has a stricter privacy requirement than PolyOpt, as it takes

F

to be maximal; in Section 8, we investigate numerically to what extent this results in a lower utility.

We start by giving a characterization of

(ε, P_{X})

-RLDP. Like LDP, this can be defined by an inequality constraint on the matrix Q.

Proposition 3.

Q

satisfies

(ε, P_{X})

-RLDP if and only if for all

y \in Y

and

(s, u), (s^{'}, u^{'}) \in X

with

s \neq s^{'}

one has

\frac{Q_{y | s, u}}{Q_{y | s^{'}, u^{'}}} \leq e^{ε} .

(50)

Proof.

Suppose that

Q

satisfies

(ε, F)

-RLDP with respect to

P_{X}

. Let

(s, u), (s^{'}, u^{'}) \in X

with

s \neq s^{'}

. Let P be given by

P_{x} = \{\begin{matrix} \frac{1}{2}, & if x \in {(s, u), (s^{'}, u^{'})}, \\ 0, & otherwise . \end{matrix}

(51)

Then,

P_{u | s} = 1

and

P_{u^{''} | s} = 0

for all

u^{″} \neq u

; an analogous statements holds for

P_{u^{'} | s^{'}}

. It follows that

\begin{matrix} (52) & \frac{Q_{y | s, u}}{Q_{y | s^{'}, u^{'}}} & = \frac{Q_{y | s, u} P_{u | s}}{Q_{y | s^{'}, u^{'}} P_{u^{'} | s^{'}}} \\ (53) & = \frac{\sum_{u^{″}} Q_{y | s, u^{″}} P_{u^{″} | s}}{\sum_{u^{″}} Q_{y | s^{'}, u^{″}} P_{u^{″} | s^{'}}} \\ (54) & = \frac{P_{X \sim P} (Q (X) = y | S = s)}{P_{X \sim P} (Q (X) = y | S = s^{'})} \leq e^{ε} . \end{matrix}

This proves “⇒”. On the other hand, suppose that

\frac{Q_{y | s, u}}{Q_{y | s^{'}, u^{'}}} \leq e^{ε}

for all

s \neq s^{'}

and

u, u^{'}

. Then, for all

s \neq s^{'}

and P, we have

\frac{P_{X \sim P} (Q (X) = y | S = s)}{P_{X \sim P} (Q (X) = y | S = s^{'})} = \frac{\sum_{u} Q_{y | s, u} P_{u | s}}{\sum_{u^{'}} Q_{y | s^{'}, u^{'}} P_{u^{'} | s^{'}}} \leq e^{ε} .

(55)

Hence,

Q

satisfies

(ε, P_{X})

-RLDP with respect to.

F

. □

The proposition demonstrates that RLDP is very similar to LDP. The difference is that the condition “for all

x, x^{'} \in X

” from Definition 2 is relaxed to only those x and

x^{'}

for which

s \neq s^{'}

.

Before moving on and introducing a new mechanism, note that Proposition 3 clearly illustrates the reason that the setting in this paper cannot be modeled using the block-structured approach from [12]. We see that if

u \neq u^{'}

, we still have a privacy constraint, whereas in [12] this is not the case.

Next, we will introduce a mechanism that exploits the difference between LDP and RLDP. Recall that

a = | X |

; then generalized randomized response [19] is the privacy mechanism

{GRR}^{ε} : X \to X

given by

{GRR}_{y | x}^{ε} = \{\begin{matrix} \frac{e^{ε}}{e^{ε} + a - 1} & if x = y, \\ \frac{1}{e^{ε} + a - 1} & otherwise . \end{matrix}

(56)

This mechanism has been designed such that

\frac{{GRR}_{y | x}^{ε}}{{GRR}_{y | x^{'}}^{ε}} = e^{\pm ε}

for

x \neq x^{'}

, the maximal fractional difference that

ε

-LDP allows. We will see that for RLDP we can go up to a difference of

e^{\pm 2 ε}

if

x = (s, u)

and

x^{'} = (s, u^{'})

, as we typically only need to satisfy

Q_{y | s, u} \leq e^{ε} Q_{y | s^{'}, u^{'}} \leq e^{2 ε} Q_{y | s, u^{'}} .

(57)

We capture the intuition from the necessary condition (57) in a new mechanism called secret randomized response (SRR). Recall that

a_{1} = | S |

,

a_{2} = | U |

.

Definition 4.

(Secret randomized response (SRR)). Let

ε > 0

. Then, the privacy mechanism

{SRR}^{ε} : X \to X

is given by

{SRR}_{s^{'}, u^{'} | s, u}^{ε} = \{\begin{matrix} \frac{e^{ε}}{e^{ε} + e^{- ε} (a_{2} - 1) + a - a_{2}}, & i f (s^{'}, u^{'}) = (s, u), \\ \frac{e^{- ε}}{e^{ε} + e^{- ε} (a_{2} - 1) + a - a_{2}}, & i f s^{'} = s a n d u^{'} \neq u, \\ \frac{1}{e^{ε} + e^{- ε} (a_{2} - 1) + a - a_{2}}, & i f s^{'} \neq s, \end{matrix}

(58)

It is clear that

\frac{{SRR}_{y | s, u}^{ε}}{{SRR}_{y | s^{'}, u^{'}}^{ε}} \in {e^{- 2 ε}, e^{ε}, 1, e^{ε}, e^{2 ε}}

, and the two extreme cases are only possible when

s = s^{'}

. Thus, we can conclude

Lemma 2.

SRR satisfies

(ε, P_{X})

-RLDP.

Example 4.

We continue Example 3. Although SRR is closely related to GRR, adopting it can still have a significant impact on utility. For instance, in the setting of Example 3, we obtain

{GRR}^{ε} = (\begin{matrix} 0.4 & 0.2 & 0.2 & 0.2 \\ 0.2 & 0.4 & 0.2 & 0.2 \\ 0.2 & 0.2 & 0.4 & 0.2 \\ 0.2 & 0.2 & 0.2 & 0.4 \end{matrix}), {SRR}^{ε} = (\begin{matrix} 0.444 & 0.111 & 0.222 & 0.222 \\ 0.111 & 0.444 & 0.222 & 0.222 \\ 0.222 & 0.222 & 0.444 & 0.111 \\ 0.222 & 0.222 & 0.111 & 0.444 \end{matrix}) .

(59)

Then,

\begin{matrix} I_{\hat{P}} (X; {GRR}^{ε} (X)) & = 0.0419, & I_{\hat{P}} (X; {SRR}^{ε} (X)) & = 0.1005, \end{matrix}

(60)

\begin{matrix} I_{P^{*}} (X; {GRR}^{ε} (X)) & = 0.0412, & I_{P^{*}} (X; {SRR}^{ε} (X)) & = 0.0942 . \end{matrix}

(61)

We see that adopting SRR more than doubles the utility. Compared to Example 3, we see that the utility is still significantly lower than that of PolyOpt, but the advantage is that we obtain SRR directly from ε, without having to take

\hat{P}

or

F

into account; this ensures a significantly faster computation.

The power of SRR, beyond slightly improving on GRR, is that we can prove it maximizes

I_{P} (X; Y)

for sufficiently large

ε

; the cutoff point depends on P. This is proven analogously to the result of [5], where GRR is the optimal LDP mechanism for sufficiently large

ε

.

Theorem 4.

For every P, there is an

ε_{0} \geq 0

such that for all

ε \geq ε_{0}

, SRR is the

(ε, P_{X})

-RLDP mechanism maximizing

I_{P} (X; Y)

.

The proof of this theorem follows the same lines as the proof of Theorem 14 of [5], in which it is proven that GRR is the optimal LDP mechanism for sufficiently large

ε

. The proof is presented in Appendix A.6. This solves the problem of finding the optimal

(ε, P_{X})

-mechanism, for sufficiently large

ε

. This strategy is similar to the proof of Theorem 3: one can show that the rows

Q_{y}

of the optimal

(ε, F)

-RLDP mechanism

Q

correspond to vertices of a polyhedron, and the optimal weights assigned to these vertices are found using a linear programming problem. Unlike in the case of Theorem 3, however, we can give an explicit description of the set of vertices, and we can solve the linear programming problem analytically.

Our result shows that if one wishes to satisfy

(ε, P_{X})

-RLDP, then SRR is a solid choice, especially for larger

ε

, since it maximizes

I_{P^{*}} (X; Y)

for sufficiently

ε

. Thus, we can optimize

I_{P^{*}} (X; Y)

without having to know

P^{*}

, with the caveat that the cutoff point for ‘large enough’ depends on

P^{*}

.

In [5], the optimal LDP mechanism in the high-privacy regime (i.e.,

ε ≪ 1

) was also found. In principle, we could also do this for

(ε, P_{X})

-RLDP, but this would not be of much use, as the optimal mechanism would depend on

P^{*}

, which we assume to be unknown.

7. Independent Reporting

Section 5 demonstrated the need to find efficiently computable

(ε, F)

-RLDP mechanisms with decent utility. In Section 6, we approach this problem by considering

(ε, P_{X})

-RLDP instead, allowing us to analytically obtain the optimal mechanism. However, when

F

is small, this overapproximation might result in a large loss of utility. In this section, we describe independent reporting (IR), a different heuristic that takes the size of

F

into account, while still being significantly less computationally complex than PolyOpt.

The basis of IR is to apply two separate LDP mechanisms

R^{1}

and

R^{2}

to S and U, respectively, reporting both outputs.

Definition 5.

Let

Y^{1}, Y^{2}

be sets, and let

Y = Y^{1} \times Y^{2}

. Let

R^{1} : S \to Y^{1}

and

R^{2} : U \to Y^{2}

be probabilistic maps. Then, theindependent reporting of

R^{1}

and

R^{2}

is the probabilistic map

{I R}_{R^{1}, R^{2}} : X \to Y

given by

{I R}_{R^{1}, R^{2}} (s, u) = (R^{1} (s), R^{2} (u))

.

Suppose that

R^{i}

satisfies

ε_{i}

-LDP. The composition theorem for differential privacy [59] tells us that

{I R}_{R^{1}, R^{2}}

satisfies

(ε_{1} + ε_{2})

-LDP. However, in the RLDP setting, U only indirectly leaks information about S; therefore, we can get away with a higher

ε_{2}

compared to the LDP setting. How much higher depends on the degree of relatedness of S and U, which is captured by the possible values of P in

F

. The precise statement is given in the following result:

Theorem 5.

Let

ε_{1}, ε_{2} \in R_{\geq 0}

. For each s, let

d_{s} \in [0, \infty)

be such that

d_{s} \geq {rad}_{s} (F)

. Furthermore, define

d = \min \{2, max_{s} (2 d_{s}) + max_{s, s^{'}} | | {\hat{P}}_{U | s} - {\hat{P}}_{U | s^{'}} {| |}_{1}\} .

(62)

Let

δ_{2} = log (1 + \frac{2 (e^{ε_{2}} - 1)}{d})

. Suppose that

R^{1}

is

ε_{1}

-LDP and that

R^{2}

is

δ_{2}

-LDP. Then, IR is

(ε_{1} + ε_{2}, F)

-RLDP.

If

S = U

, then

| | {\hat{P}}_{U | s} - {\hat{P}}_{U | s^{'}} {| |}_{1} = 2

for

s \neq s^{'}

, so

d = 2

and

δ_{2} = ε_{2}

. In this case, Theorem 5 is the RLDP analogue to the well-known composition theorem for local differential privacy [59]. In general,

δ_{2} \geq ε_{2}

; this represents the fact the privacy requirement on

R^{2}

is less strict when S and U are only partially related. At the other extreme, if S and U are independent in our observation, we have

| | {\hat{P}}_{U | s} - {\hat{P}}_{U | s^{'}} {| |}_{1} = 0

for all

s, s^{'}

. Still, we cannot fully disclose U, since S and U might be non-independent under

P^{*}

. The term

d_{s}

is present in the definition of d to account for this possibility.

In order to prove Theorem 5, we need the following lemma:

Lemma 3.

Let

Q : X \to Y

be an ε-LDP mechanism. Then, for all

y \in Y

and all

P, P^{'} \in P_{X}

we have

\frac{P_{X \sim P} (Q (X) = y)}{P_{X \sim P^{'}} (Q (X) = y)} \leq 1 + \frac{e^{ε} - 1}{2} | | P - P^{'} {| |}_{1} .

(63)

Proof.

Fix y, and let

Q_{y}^{\max} = {max}_{x} Q_{y | x}

and

Q_{y}^{\min} = {min}_{x} Q_{y | x}

. By the

ε

-LDP property, it holds that

Q_{y}^{\max} \leq e^{ε} Q_{y}^{\min}

. We hence find

\begin{matrix} (64) & P_{X \sim P} (Q (X) = y) - P_{X \sim P^{'}} (Q (X) = y) & = \sum_{x \in X} Q_{y | x} (P_{x} - P_{x}^{'}) \\ (65) & = \sum_{x : P_{x} \geq P_{x}^{'}} Q_{y | x} (P_{x} - P_{x}^{'}) - \sum_{x : P_{x}^{'} > P_{x}} Q_{y | x} (P_{x}^{'} - P_{x}) \\ (66) & \leq \frac{Q_{y}^{\max}}{2} | | P - P^{'} {| |}_{1} - \frac{Q_{y}^{\min}}{2} | | P - P^{'} {| |}_{1} \\ (67) & \leq \frac{(e^{ε} - 1) Q_{y}^{\min}}{2} | | P - P^{'} {| |}_{1} \\ (68) & \leq \frac{(e^{ε} - 1) P_{X \sim P^{'}} (Q (X) = y)}{2} | | P - P^{'} {| |}_{1}, \end{matrix}

from which the lemma directly follows. □

Proof

(Proof of Theorem 5). We start by showing that d is an upper bound for

| | P_{U | s} - P_{U | s^{'}} {| |}_{1}

. If

d = 2

, this is certainly the case. Suppose

d = {max}_{s} (2 d_{s}) + {max}_{s, s^{'}} | | {\hat{P}}_{U | s} - {\hat{P}}_{U | s^{'}} {| |}_{1}

. Then, for all

s, s^{'} \in S

and

P \in F

we have

\begin{matrix} (69) & | | P_{U | s} - P_{U | s^{'}} {| |}_{1} & \leq | | P_{U | s} - {\hat{P}}_{U | s} {| |}_{1} + | | {\hat{P}}_{U | s} - {\hat{P}}_{U | s^{'}} {| |}_{1} + | | {\hat{P}}_{U | s^{'}} - P_{U | s^{'}} {| |}_{1} \\ (70) & \leq d_{s} + d_{s^{'}} + | | {\hat{P}}_{U | s} - {\hat{P}}_{U | s^{'}} {| |}_{1} \\ (71) & \leq d . \end{matrix}

Combining Lemma 3 with the fact that

ε_{2} = log (1 + \frac{d (e^{δ_{2}} - 1)}{2})

, it follows that for every

y_{2} \in Y_{2}

, we have

\begin{matrix} (72) & \frac{P_{X \sim P} (R^{2} (U) = y_{2} | S = s)}{P_{X \sim P} (R^{2} (U) = y_{2} | S = s^{'})} & \leq 1 + \frac{e^{δ_{2}} - 1}{2} | | P_{U | s} - P_{U | s^{'}} {| |}_{1} \\ (73) & \leq 1 + \frac{d (e^{δ_{2}} - 1)}{2} \\ (74) & = e^{ε_{2}} . \end{matrix}

Given S, the random variables

R^{1} (S)

and

R^{2} (U)

are independent. It follows that for every

y_{1} \in Y_{1}

and every

y_{2} \in Y_{2}

, we have

\begin{matrix} (75) & \frac{P (R^{1} (S) = y_{1}, R^{2} (U) = y_{2} | S = s)}{P (R^{1} (S) = y_{1}, R^{2} (U) = y_{2} | S = s^{'})} & = \frac{P (R^{1} (S) = y_{1}, | S = s)}{P (R^{1} (S) = y_{1} | S = s^{'})} \cdot \frac{P (R^{2} (U) = y_{2} | S = s)}{P (R^{2} (U) = y_{2} | S = s^{'})} \\ (76) & \leq e^{ε_{1} + ε_{2}}, \end{matrix}

where the last equality holds because of (74) and because

R^{1}

is

ε_{1}

-LDP. This shows that

{I R}_{R^{1}, R^{2}}

is

(ε_{1} + ε_{2}, F)

-RLDP. □

Theorem 5 establishes the privacy of independent reporting. To maximize the utility, we need to determine how to divide the privacy budget

ε

between

ε_{1}

and

ε_{2}

, and which LDP mechanisms to use for

R^{1}

and

R^{2}

. To answer both these questions, we first need an expression for the utility of IR, which is given by the following theorem:

Theorem 6.

For any

P \in P_{X}

, one has

I_{P} ({I R}_{R^{1}, R^{2}} (X); X) = I_{P} (R^{1} (S); S) + I_{P} (R^{2} (U); U | R^{1} (S)) .

(77)

Proof.

Since

R^{1} (S)

and U are independent given S, and

R^{2} (U)

and S are independent given U and

R^{1} (S)

, we have

\begin{matrix} (78) & I_{P} ({I R}_{R^{1}, R^{2}} (X); X) & = I_{P} (R^{1} (S), R^{2} (U); U, S) \\ (79) & = I_{P} (R^{1} (S); U, S) + I_{P} (R^{2} (U); U, S | R^{1} (S)) \\ (80) & = I_{P} (R^{1} (S); S) + I_{P} (R^{2} (U); U | R^{1} (S)) . \end{matrix}

□

We use Theorems 5 and 6 to find high-utility IR protocols that satisfy

(ε, F)

-RLDP, given

ε

and

F

. To do so, we need to choose

R^{1}

and

R^{2}

, and split the privacy budget between them. Since the expression for the utility of IR in Theorem 6 contains a term

I_{P} (R^{1} (S); S)

, the

R^{1}

that maximizes this is GRR when

ε

is large enough; thus, we choose

R^{1} = GRR

. The second term in the utility expression is

I_{P} (R^{2} (U); U | R^{1} (S)) = E_{r} [I_{U \sim P_{U} | R^{1} (S) = r} (R^{2} (U); U)] .

(81)

This is the expected value of an expression that is maximized for

R^{2} = GRR

, with the caveat that the maximization only holds when

ε

is large enough, and what ‘large enough’ is depends on the distribution of U. Since this gives us a choice of

R^{2}

independent of the distribution, we ignore this caveat and take

R^{2} = GRR

as well.

Having chosen

R^{1}

and

R^{2}

, we are only left with the division of the privacy budget. If we choose

ε_{2}

, then by Theorem 5 the privacy parameters of

R^{1}

and

R^{2}

are

ε_{1} = ε - ε_{2}

and

δ_{2} = log (1 + \frac{2 (e^{ε_{2}} - 1)}{d})

, respectively. It follows that to find a high-utility IR protocol, we have to solve the following optimization problem:

\begin{matrix} {maximize}_{ε_{2}} & I_{P} ({GRR}_{ε - ε_{2}} (S), {GRR}_{log (1 + \frac{2 (e^{ε_{2}} - 1)}{d})} (U); S, U) \\ subject to & ε_{2} \in [0, ε] . \end{matrix}

(82)

This optimization problem is only 1-dimensional. While it is not straightforward to express the complexity of solving this in

O

-notation, our experiments in Section 8 show this can be quickly performed numerically, and significantly faster than PolyOpt.

Example 5.

We continue Example 4. Having found

{rad}_{s} (F)

and

{\hat{P}}_{U | s_{1}}, {\hat{P}}_{U | s_{2}}

in Example 2, we conclude that, in Theorem 5, we have

d = min \{2, 2 \cdot max {0.6107, 0.3061} + {||(\binom{0.4118}{0.5882}) - (\binom{0.3133}{0.6867})||}_{1}\} = 1.4591 .

(83)

It follows that

δ_{2} = log (1 + \frac{2}{1.4591} (e^{ε_{2}} - 1)) = log (1.3707 e^{ε_{2}} - 0.3707)

. For a given value of

ε_{2}

, the matrix corresponding to

I R ({GRR}^{log (2) - ε_{2}}, {GRR}^{δ_{2}})

is the Kronecker product

\begin{matrix} (\begin{matrix} \frac{2 e^{- ε_{2}}}{2 e^{- ε_{2}} + 1} \frac{1}{2 e^{- ε_{2}} + 1} \\ \frac{1}{2 e^{- ε_{2}} + 1} \frac{2 e^{- ε_{2}}}{2 e^{- ε_{2}} + 1} \end{matrix}) \otimes (\begin{matrix} \frac{1.3707 e^{ε_{2}} - 0.3707}{1.3707 e^{ε_{2}} + 0.6293} \frac{1}{1.3707 e^{ε_{2}} + 0.6293} \\ \frac{1}{1.3707 e^{ε_{2}} + 0.6293} \frac{1.3707 e^{ε_{2}} - 0.3707}{1.3707 e^{ε_{2}} + 0.6293} \end{matrix}) \\ = \frac{1}{C} (\begin{matrix} 2.7414 - 0.7414 e^{- ε_{2}} & 2 e^{- ε_{2}} & 1.3707 e^{ε_{2}} - 0.3707 & 1 \\ 2 e^{- ε_{2}} & 2.7414 - 0.7414 e^{- ε_{2}} & 1 & 1.3707 e^{ε_{2}} - 0.3707 \\ 1.3707 e^{ε_{2}} - 0.3707 & 1 & 2.7414 - 0.7414 e^{- ε_{2}} & 2 e^{- ε_{2}} \\ 1 & 1.3707 e^{ε_{2}} - 0.3707 & 2 e^{- ε_{2}} & 2.7414 - 0.7414 e^{- ε_{2}} \end{matrix}), \end{matrix}

(84)

where

C = (2 e^{- ε_{2}} + 1) (1.3707 e^{ε_{2}} + 0.6293)

. We now wish to optimize its utility, i.e., find the

ε_{2} \in [0, log 2]

that maximizes

I_{\hat{P}} (X; Y)

. The optimum occurs at the boundary

ε_{2} = log (2)

, for which

I_{\hat{P}} (X; Y) = 0.0755

. Notice that now

ε_{1} = 0

, so

R^{1} = {GRR}^{0}

is completely random: its output does not depend on the input. In other words, the optimal IR protocol in this case does not transmit any direct information about S at all, only indirectly through

{GRR}^{δ_{2}} (U)

. In this case, we have

Q_{I R} = (\begin{matrix} 0.3517 & 0.1483 & 0.3517 & 0.1483 \\ 0.1483 & 0.3517 & 0.1483 & 0.3517 \\ 0.3517 & 0.1483 & 0.3517 & 0.1483 \\ 0.1483 & 0.3517 & 0.1483 & 0.3517 \end{matrix}) .

(85)

Regarding the ‘true’ utility, we have

I_{P^{*}} (X; Y) = 0.0718

. Interestingly,

Q_{I R}

yields less utility than SRR. As we will see in Section 8, this is typical for small

S

and

U

.

8. Experiments

In order to gain insight into the behavior of the different mechanisms, we performed several experiments, both on synthetic and real data. We compared the three mechanisms introduced in this paper (PolyOpt, SRR, and IR). Throughout, we let

F

be a confidence set for a

χ^{2}

-test, i.e., for a Rényi divergence with

α = 2

. We used the results of Section 4 to find explicit expressions for

L_{u | s} (F)

and (an upper bound for)

{rad}_{s} (F)

. Recall from Section 3 that

F = \{P \in P_{X} : D_{2} (\hat{P} | | P) \leq log (1 + \frac{F_{χ^{2}, a - 1}^{- 1} (1 - β)}{n})\},

(86)

where

F_{χ^{2}, a - 1}

is the cumulative density function of the

χ^{2}

-distribution with

a - 1

degrees of freedom, and

β \in (0, 1)

is a chosen significance level. Throughout the experiments, we took

β = 0.05

, unless otherwise specified.

We used

I_{\hat{P}} (X; Y)

as a utility metric, divided by

H (X)

to obtain the normalized mutual information (NMI). We used this rather than

I_{P^{*}} (X; Y)

, as the aggregator only has access to the former. In fact, while

P^{*}

is known for the synthetic data, this is not the case for real data, so we cannot even use

I_{P^{*}} (X; Y)

as a utility metric.

We compared our methods to two existing approaches, each with a slightly different privacy model. First, we compared to an LDP mechanism, to see to what extent the RLDP framework offered a utility improvement over regular LDP. As the LDP mechanism, we chose GRR, because it optimizes

I_{P} (X; Y)

, our privacy metric, in the low-privacy regime [5]. Second, we compared to the non-robust optimal mechanism of [3]. This mechanism is obtained in a manner similar to PolyOpt, and is the optimal mechanism that satisfies (in our notation)

(ε, {\hat{P}})

-RLDP. In other words, it is optimal in the scenario where one knows

P^{*}

precisely. We shall refer to this mechanism as NR (non-robust). Typically, we would expect NR to have a higher utility than our RLDP mechanisms, (because it only needs to satisfy privacy with respect to. one distribution) and GRR to have worse a utility (because LDP is stricter than RLDP).

8.1. Adult Data Set

We performed numerical experiments on the adult data set (n = 32,561) [60], which contains demographic data from the 1994 US census. Some examples, where we used different categorical attributes from the data set as S and U, are depicted in Figure 2. We omitted PolyOpt from the larger two experiments, as the space complexity became unfeasible: for occupation vs. education, the polyhedron

\hat{Γ}

was 240-dimensional and was defined by 57,840 inequality constraints; to find its set of vertices Matlab needed to operate on a 57,840 × 57,840 matrix, whose size (24.7 GB) exceeded Matlab’s maximum array size.

We can see that PolyOpt clearly outperformed IR and SRR in the first two experiments, especially in the high-privacy regime (low

ε

). Similarly, IR outperformed SRR in the high-privacy regime, but was slightly overtaken for high

ε

. This is interesting, since SRR satisfies a stronger privacy guarantee, as it provides privacy for all adversary assumptions, so we expected it to offer less utility than IR. An explanation for this is that IR is forced to transmit S and U separately, and so it can be less efficient than SRR, which does not have this restriction. At any rate, the difference between IR and SRR in the low-privacy regime was only marginal compared to the advantage of PolyOpt over both. In the second two experiments, where PolyOpt was infeasible, we can see that IR clearly outperformed SRR. Overall, we see that, especially in the low-privacy regime, PolyOpt was the preferable RLDP mechanism, followed by IR and SRR. Furthermore, we can see that, in all experiments, GRR performed the worst, and the best RLDP mechanism significantly outperformed GRR. This shows that adopting RLDP as a privacy metric results in significantly better utility over LDP. Conversely, NR outperfored the RLDP methods, although the difference between NR and PolyOpt was marginal for higher

ε

. As for PolyOpt, NR was computationally out of reach for larger

| X |

.

8.2. Synthetic Data

To study the robustness of our method with respect to utility (Section 8.4) and privacy (Section 8.3), we also needed experiments in which

P^{*}

was known. For this, we considered experiments on synthetic data. For this, we first randomly created a probability distribution

P^{*}

on

X

, where

X

was the same as in the experiments on the adult data set. The distribution

P^{*}

was drawn from the Jeffreys prior on

P_{X}

, i.e., the symmetric Dirichlet distribution with parameter

\frac{1}{2}

. From

P^{*}

, we then drew

n = 32, 561

elements of

X

, which we used to obtain the estimate

\hat{P}

; this estimate was then used to create the privacy mechanisms. We carried this out 100 times, and we averaged the NMI of these 100 distributions. The results are shown in Figure 3. The results were similar to those of the experiments of the adult data set: PolyOpt outperformed IR, which outperformed SRR, for small

| X |

SRR could overtake IR in the low-privacy regime. Furthermore, GRR was the worst overall, while NR was the best overall, but only by a small margin.

8.3. Realized Privacy Parameter

In the previous subsections, we saw that NR had a (marginally) better utility than PolyOpt. However, this is not a completely fair comparison, since NR was only designed to give privacy for

X \sim \hat{P}

and might result in a larger privacy leakage for

X \sim P^{*}

. For the synthetic data,

P^{*}

was known, and we could measure the true privacy leakage. For a protocol

Q

, we defined the realized privacy parameter

ε^{*}

as

\begin{matrix} ε^{*} & = max_{\begin{matrix} y \in Y, \\ s_{1}, s_{2} \in S \end{matrix}} \frac{P_{X \sim P^{*}} (Y = y ∣ S = s_{1})}{P_{X \sim P^{*}} (Y = y ∣ S = s_{2})} \\ = max_{\begin{matrix} y \in Y, \\ s_{1}, s_{2} \in S \end{matrix}} \frac{\sum_{u} Q_{y | s_{1}, u} P_{u | s_{1}}^{*}}{\sum_{u} Q_{y | s_{2}, u} P_{u | s_{2}}^{*}} . \end{matrix}

Note that this becomes ∞ when there exist

s, y

such that

P_{X \sim P^{*}} (Y = y ∣ S = s) = 0

. We compared

ε^{*}

for NR and PolyOpt: the results are shown in Figure 4, where we give the 25% and 75% quantiles for both protocols, out of 100 considered distributions. As one can see, NR’s

ε^{*}

was consistently greater than

ε

, while PolyOpt’s

ε^{*}

was consistently lesser. This is what we expected, as NR does not give privacy guarantees for

P^{*}

, but PolyOpt does when

P^{*} \in F

, which happens with 95% probability. Note that the privacy leakage was especially bad for low

ε

: at

ε = 0.075

, the lowest value of

ε

we tested, the 75%-quantile of

ε^{*}

of NR was

0.3897

, which is more than 5 times the desired privacy parameter. Overall, we can conclude that NR gave marginally better utility, but this came at quite a privacy cost.

8.4. Utility Robustness

For the synthetic data sets (where we knew

P^{*}

), we also investigated the normalized difference in mutual information

\frac{I_{\hat{P}} (X; Y) - I_{P^{*}} (X; Y)}{I_{\hat{P}} (X; Y)}

, to see to what extent we could use

I_{\hat{P}} (X; Y)

as a utility metric in lieu of the true utility

I_{P^{*}} (X; Y)

. This is shown for the three methods in Figure 5, at

ε = 1.5

. Overall, we can see that the difference was quite minor: for all three methods, the difference in NMI, even at its most extreme, was less than 3% of the NMI value. Furthermore, the differences were very symmetric, with the difference being positive and negative approximately equally often. We can conclude that we were justified in using

I_{\hat{P}} (X; Y)

as a utility metric in the other experiments.

8.5. Impact of $β$

We also considered the impact of

β

on utility for synthetic data (fixing

ε = 1.5

). The results are shown in Table 2, which are averages over 100 runs. Note that SRR does not depend on

β

, since it assumes

F = P_{X}

. Interestingly, we can see that the impact of

β

was quite limited; changing

β

by a factor 100 had at most about

4 %

impact on NMI. This impact was less for PolyOpt than for IR, and less for larger

X

. Overall, we can conclude that by choosing

β

closer to 0, we can significantly increase the robustness of privacy without making a considerable impact on utility.

9. Conclusions and Future Work

In this paper, we presented a number of algorithms that, given a desired privacy level

ε

, an estimated distribution

\hat{P}

, and a bound on the Rényi divergence

D_{α} (\hat{P} | | P)

, return privacy mechanisms that satisfy a differential privacy-like privacy constraint for the part of the data that is considered sensitive, for all distributions P within the divergence bound. The first class of privacy mechanisms, PolyOpt, offers high utility, but is computationally complex, as it relies on vertex enumeration. The second class, SRR, satisfies a stronger privacy requirement and is optimal in the low-privacy regime with reference to this requirement, but as a result has less utility than mechanisms that do not satisfy this stronger privacy requirement. The third class, IR, is a general framework for releasing the sensitive and non-sensitive part of the data independently, and the optimal division of the privacy budget between these can be found via 1-dimensional optimization; thus, the optimal IR mechanism can be found quickly, while still offering decent utility. Furthermore, taking RLDP rather than LDP as a privacy constraint, i.e., protecting only the part of the data that is sensitive, significantly improves utility. In particular, we showed that the utility of PolyOpt is close to the utility of the optimal non-robust privacy mechanism. In other words, asking for robustness in privacy comes at only a small performance penalty in utility. At the same time, we showed that not asking for robustness comes at a substantial privacy cost.

There are various interesting directions for future research to build upon the results in this paper. One direction is to find analytical bounds on the performance gap between PolyOpt and optimal mechanisms, in particular on the gap with reference to either the non-robust optimal mechanism from [3] or with reference to an optimal robust mechanism. Note, however, that for the moment we do not have any results on optimal robust mechanisms. Another direction is to improve the performance of the low-complexity algorithms that have been proposed. For instance, in independent reporting, one could change the underlying LDP mechanism from GRR to an optimal mechanism. Since GRR is only optimal in the high-privacy regime, we expect that there would be room for improvement in the low-privacy regime. A significant challenge is incorporating optimal mechanisms along the lines of [5]; however, these mechanisms depend on

P^{*}

which is inaccessible in the RLDP framework. Yet another interesting direction would be to incorporate robustness in utility in addition to robustness in privacy. This would require finding a mechanism that maximizes

{min}_{P \in F} I_{P} (X; Y)

. The challenge in this is that

I_{P} (X; Y)

is concave in P, which makes minimizing it over

F

difficult. Finally, it would be interesting to apply the RLDP framework to other models. In this work, we studied the model where X splits into a sensitive part S and a non-sensitive part U. It would be interesting to also study the more general case where X is correlated with the sensitive data S, or to apply RLDP to the models that are studied in [12].

Author Contributions

Conceptualization, M.L.-Z. and J.G.; Formal analysis, M.L.-Z. and J.G.; Investigation, M.L.-Z. and J.G.; Methodology, M.L.-Z. and J.G.; Software, M.L.-Z.; Writing, M.L.-Z. and J.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially funded by the Netherlands Organisation for Scientific Research (NWO) grant 628.001.026, ERC Consolidator grant 864075 CAESAR and the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 101008233.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Proofs

Appendix A.1. Proof of Theorem 1

This follows from the following four lemmas, where the RHS of (24) is denoted

{\bar{F}}_{U | s}

:

Lemma A1.

If

α \neq 1

, then

F_{U | s} \subset {\bar{F}}_{U | s}

.

Proof.

Assume

α < 1

; the case

α > 1

is handled analogously. Then, we rewrite

D_{α} (\hat{P} | | P) \leq B

as

\sum_{x} \frac{{\hat{P}}_{x}^{α}}{P_{x}^{α - 1}} \geq e^{B (α - 1)} .

(A1)

Let

C = e^{B (α - 1)}

. Then,

\begin{matrix} (A2) & \frac{{\hat{P}}_{s}^{α}}{P_{s}^{α - 1}} \sum_{u} \frac{{\hat{P}}_{u | s}^{α}}{P_{u | s}^{α - 1}} & = \sum_{u} \frac{{\hat{P}}_{s, u}^{α}}{P_{s, u}^{α - 1}} \\ (A3) & \geq C - \sum_{s^{'} \neq s} \sum_{u} \frac{{\hat{P}}_{s^{'}, u}^{α}}{P_{s^{'}, u}^{α - 1}} . \end{matrix}

For

s^{'} \in S ∖ {s}

and

u \in U

, define

P_{s^{'}, u | \neg s} = \frac{P_{u, s^{'}}}{1 - P_{s}}

and

{\hat{P}}_{s^{'}, u | \neg s} = \frac{{\hat{P}}_{u, s^{'}}}{1 - {\hat{P}}_{s}}

. Then, (A3) can be written as

\frac{{\hat{P}}_{s}^{α}}{P_{s}^{α - 1}} \sum_{u} \frac{{\hat{P}}_{u | s}^{α}}{P_{u | s}^{α - 1}} \geq C - \frac{{(1 - {\hat{P}}_{s})}^{α}}{{(1 - P_{s})}^{α - 1}} \sum_{s^{'} \neq s} \sum_{u} \frac{{\hat{P}}_{s^{'}, u | \neg s}^{α}}{P_{s^{'}, u | \neg s}^{α - 1}} .

(A4)

Furthermore,

P_{• | \neg s} = {(P_{s^{'}, u | \neg s})}_{s^{'} \in S ∖ {s}, u \in U}

and

{\hat{P}}_{• | \neg s} = {({\hat{P}}_{s^{'}, u | \neg s})}_{s^{'} \in S ∖ {s}, u \in U}

form probability distributions on

(S ∖ {s}) \times U

. As such, we have

\sum_{u} \frac{{\hat{P}}_{s^{'}, u | \neg s}^{α}}{P_{s^{'}, u | \neg s}^{α - 1}} = e^{(α - 1) D_{α} ({\hat{P}}_{• | \neg s} | | P_{• | \neg s})} \leq 1 .

(A5)

Applying this to (A4), we obtain

\frac{{\hat{P}}_{s}^{α}}{P_{s}^{α - 1}} \sum_{u} \frac{{\hat{P}}_{u | s}^{α}}{P_{u | s}^{α - 1}} \geq C - \frac{{(1 - {\hat{P}}_{s})}^{α}}{{(1 - P_{s})}^{α - 1}}

(A6)

or

\sum_{u} \frac{{\hat{P}}_{u | s}^{α}}{P_{u | s}^{α - 1}} \geq \frac{P_{s}^{α - 1}}{{\hat{P}}_{s}^{α}} (C - \frac{{(1 - {\hat{P}}_{s})}^{α}}{{(1 - P_{s})}^{α - 1}}) .

(A7)

To find the bound on

\sum_{u} \frac{{\hat{P}}_{u | s}^{α}}{P_{u | s}^{α - 1}}

, we have to minimize the RHS of this inequality. The only unknown on the right is

P_{s}

. We find the minimum value of the right-hand side by differentiating with respect to

P_{s}

, for which we obtain

(α - 1) \frac{P_{s}^{α - 2} {(1 - {\hat{P}}_{s})}^{α}}{{\hat{P}}_{s}^{α}} (\frac{C}{{(1 - {\hat{P}}_{s})}^{α}} - \frac{1}{{(1 - P_{s})}^{α}}) .

(A8)

Setting this equal to 0, we find

P_{s} = 1 - C^{- 1 / α} (1 - {\hat{P}}_{s})

. Substituting this into (A7), we obtain

\sum_{u} \frac{{\hat{P}}_{u | s}^{α}}{P_{u | s}^{α - 1}} \geq \frac{{(C^{1 / α} - (1 - {\hat{P}}_{s}))}^{α}}{{\hat{P}}_{s}^{α}}

(A9)

which can be written as

D_{α} ({\hat{P}}_{U | s} | | P_{U | s}) \leq \frac{α}{α - 1} log (\frac{e^{(α - 1) B / α} - (1 - {\hat{P}}_{s})}{{\hat{P}}_{s}}),

(A10)

showing that

P_{U | s} \in {\bar{F}}_{U | s}

. Since P was chosen arbitrarily, we can conclude

F_{U | s} \subset {\bar{F}}_{U | s}

. □

Lemma A2.

If

α \neq 1

, then

{\bar{F}}_{U | s} \subset F_{U | s}

.

Proof.

Again we assume

α < 1

. Suppose that

R \in P_{U}

satisfies

D_{α} ({\hat{P}}_{U | s} | | R) \leq B_{s}

. Let C be as in (A1) and define

γ = 1 - C^{- 1 / α} (1 - {\hat{P}}_{s})

; then,

\begin{matrix} (A11) & \frac{1}{α - 1} log (\sum_{u} \frac{{\hat{P}}_{u | s}^{α}}{R_{u}^{α - 1}}) & = D_{α} ({\hat{P}}_{U | s} | | R) \\ (A12) & \leq B_{s} \\ (A13) & = \frac{α}{α - 1} log (\frac{e^{(α - 1) B / α} - (1 - {\hat{P}}_{s})}{{\hat{P}}_{s}}) \\ (A14) & = \frac{α}{α - 1} log (\frac{C^{1 / α} γ}{{\hat{P}}_{s}}) \end{matrix}

which we can express as

\sum_{u} \frac{{\hat{P}}_{u | s}^{α}}{R_{u}^{α - 1}} \geq \frac{C γ^{α}}{{\hat{P}}_{s}^{α}} .

(A15)

Define

P \in P_{X}

by

P_{u, s^{'}} = \{\begin{matrix} γ R_{u}, & if s^{'} = s, \\ C^{- 1 / α} {\hat{P}}_{u, s^{'}} & otherwise . \end{matrix}

(A16)

Then,

P_{U | s} = R

, and

\begin{matrix} (A17) & \sum_{u, s^{'}} \frac{{\hat{P}}_{u, s^{'}}^{α}}{P_{u, s^{'}}^{α - 1}} & = \sum_{u} \frac{{\hat{P}}_{u, s}^{α}}{γ^{α - 1} R_{u}^{α - 1}} + \sum_{u} \sum_{s^{'} \neq s} C^{1 / α} {\hat{P}}_{u, s^{'}} \\ (A18) & = \frac{{\hat{P}}_{s}^{α}}{γ^{α - 1}} \sum_{u} \frac{{\hat{P}}_{u | s}^{α}}{R_{u}^{α - 1}} + C^{(α - 1) / α} (1 - {\hat{P}}_{s}) \\ (A19) & \geq γ C + C^{(α - 1) / α} (1 - {\hat{P}}_{s}) \\ (A20) & = C . \end{matrix}

As in the proof of Lemma A1, the condition

\sum_{u, s^{'}} \frac{{\hat{P}}_{u, s^{'}}^{α}}{P_{u, s^{'}}^{α - 1}} \geq C

is equivalent to

D_{α} (\hat{P} | | P) \leq B

. Thus, we can conclude that

P \in F

and so

R = P_{U | s} \in F_{U | s}

. Since R was chosen arbitrary, this shows

{\bar{F}}_{U | s} \subset F_{U | s}

. □

Lemma A3.

If

α = 1

, then

F_{U | s} \subset {\bar{F}}_{U | s}

.

Proof.

Let

P \in F

, and define

P_{• | \neg s}, {\hat{P}}_{• | \neg s}

as in the proof of Lemma A1. Then,

\begin{matrix} (A21) & D_{1} (\hat{P} | | P) & = {\hat{P}}_{s} \sum_{u} {\hat{P}}_{u | s} log \frac{{\hat{P}}_{s} {\hat{P}}_{u | s}}{P_{s} P_{u | s}} + (1 - {\hat{P}}_{s}) \sum_{s^{'}, u} {\hat{P}}_{u, s^{'} | \neg s} \frac{(1 - {\hat{P}}_{s}) {\hat{P}}_{u, s^{'} | \neg s}}{(1 - P_{s}) P_{u, s^{'} | \neg s}} \\ (A22) & = {\hat{P}}_{s} D_{1} ({\hat{P}}_{U | s} | | P_{U | s}) + (1 - {\hat{P}}_{s}) D_{1} ({\hat{P}}_{• | \neg s} | | P_{• | \neg s}) + {\hat{P}}_{s} log \frac{{\hat{P}}_{s}}{P_{s}} + (1 - {\hat{P}}_{s}) log \frac{1 - {\hat{P}}_{s}}{1 - P_{s}} \\ (A23) & = {\hat{P}}_{s} D_{1} ({\hat{P}}_{U | s} | | P_{U | s}) + (1 - {\hat{P}}_{s}) D_{1} ({\hat{P}}_{• | \neg s} | | P_{• | \neg s}) + D_{1} (V_{{\hat{P}}_{s}} | | V_{P_{s}}), \end{matrix}

where for

p \in [0, 1]

, the random variable

V_{p}

is defined to follow a Bernoulli distribution with

P (V_{p} = 1) = p

. Since

D_{1}

is non-negative and

D_{1} (\hat{P} | | P) \leq B

, we find

\begin{matrix} (A24) & D_{1} ({\hat{P}}_{U | s} | | P_{U | s}) & = \frac{1}{{\hat{P}}_{s}} (D_{1} (\hat{P} | | P) - (1 - {\hat{P}}_{s}) D_{1} ({\hat{P}}_{• | \neg s} | | P_{• | \neg s}) - D_{1} (V_{{\hat{P}}_{s}} | | V_{P_{s}})) \\ (A25) & \leq \frac{D_{1} (\hat{P} | | P)}{{\hat{P}}_{s}} \\ (A26) & \leq \frac{B}{{\hat{P}}_{s}} . \end{matrix}

Thus,

P_{U | s} \in {\bar{F}}_{U | s}

; since

P \in F

was chosen arbitrary, we can conclude

F_{U | s} \subset {\bar{F}}_{U | s}

. □

Lemma A4.

If

α = 1

, then

{\bar{F}}_{U | s} \subset F_{U | s}

.

Proof.

Let

R \in P_{U}

be such that

D_{1} ({\hat{P}}_{U | s} | | R) \leq \frac{B}{{\hat{P}}_{s}}

. Define

P \in P_{X}

by

P_{u, s^{'}} = \{\begin{matrix} {\hat{P}}_{s} R_{u}, & if s = s^{'}, \\ {\hat{P}}_{u, s^{'}} & if s \neq s^{'} . \end{matrix}

(A27)

Then

P_{U | s} = R

. Furthermore, in (A23) one has

D_{1} ({\hat{P}}_{• | \neg s} | | P_{• | \neg s}) = D_{1} (V_{{\hat{P}}_{s}} | | V_{P_{s}}) = 0

, and so

D_{1} (\hat{P} | | P) \leq B

. This shows that

P \in F

, and so

R = P_{U | s} \in F_{U | s}

. Since R was chosen arbitrarily, we can conclude that

{\bar{F}}_{U | s} \subset F_{U | s}

. □

Appendix A.2. Proof of Proposition 1

We first prove the following two auxiliary lemmas. We only prove these for

α > 1

; the other cases are handled analogously.

Lemma A5.

Let

x \in X

, and define

\begin{matrix} (A28) & \begin{matrix} ξ_{-} (ρ) & = inf \{ξ \in (0, 1] : E_{B} (ρ, ξ) \leq 0\}, \end{matrix} \\ (A29) & \begin{matrix} ξ_{+} (ρ) & = sup \{ξ \in [1, {(1 - ρ)}^{- 1}) : E_{B} (ρ, ξ) \leq 0\}, \end{matrix} \end{matrix}

where

E_{B}

is as in Proposition 1. Then,

{min}_{P \in F} P_{x} = {\hat{P}}_{x} ξ_{-} ({\hat{P}}_{x})

and

{max}_{P \in F} = {\hat{P}}_{x} ξ_{+} ({\hat{P}}_{x})

.

Proof.

As in the proof of Lemma A1, define

C = e^{(α - 1) B}

; thus

F = \{P \in P_{X} : \sum_{x^{'} \in X} \frac{{\hat{P}}_{x^{'}}^{α}}{P_{x^{'}}^{α - 1}} \leq C\} .

(A30)

Furthermore, define a function F by

F (ρ, ξ) = ρ ξ^{1 - α} + (1 - ρ) {(\frac{1 - ρ ξ}{1 - ρ})}^{1 - α} - C,

(A31)

with

F (1, ξ) = ξ^{1 - α} - C

the limit as

ρ \to 1

. Then,

E_{B} (ρ, ξ) = \frac{1}{α - 1} log (F (ρ, ξ) + e^{(α - 1) B}) - B

, so

F (ρ, ξ) \leq 0 \Leftrightarrow E_{B} (ρ, ξ) \leq 0

. Thus,

ξ_{-} (ρ) = inf {ξ \in [0, 1] : F (ρ, ξ) \leq 0}

and the analogous statement holds for

ξ_{+} (ρ)

. The P that yield the extremal

P_{x}

lie on the boundary of

F

; hence, they either satisfy

P_{x} \in {0, 1}

, or the equality

\sum_{x^{'} \in X} \frac{{\hat{P}}_{x^{'}}^{α}}{P_{x^{'}}^{α - 1}} = C .

(A32)

In the latter case, the extremal values of

P_{x}

have to be stationary points of the Lagrangian expression

P_{x} + λ (\sum_{x^{'}} \frac{{\hat{P}}_{x^{'}}^{α}}{P_{x^{'}}^{α - 1}} - C) + μ (\sum_{x^{'}} P_{x^{'}} - 1) = 0 .

(A33)

Taking derivatives with respect to all

P_{x}

, we find

\begin{matrix} (A34) & \begin{matrix} 1 + (1 - α) λ \frac{{\hat{P}}_{x}^{α}}{P_{x}^{α}} + μ & = 0, \end{matrix} \\ (A35) & \begin{matrix} \forall x^{'} \neq x : (1 - α) λ \frac{{\hat{P}}_{x^{'}}^{α}}{P_{x^{'}}^{α}} + μ & = 0 . \end{matrix} \end{matrix}

It follows that

P_{x} = {(\frac{(α - 1) λ}{μ - 1})}^{1 / α} {\hat{P}}_{x} = : ξ {\hat{P}}_{x}

and

P_{x^{'}} = {(\frac{(α - 1) λ}{μ})}^{1 / α} {\hat{P}}_{x^{'}} = : ψ {\hat{P}}_{x^{'}}

for all

x^{'} \neq x

, where

ξ

and

ψ

do not depend on x or

x^{'}

. We can find

ξ, ψ \in R_{\geq 0}

by solving the joint set of equations

\begin{matrix} (A36) & C & = \sum_{x^{'}} \frac{{\hat{P}}_{x^{'}}^{α}}{P_{x^{'}}^{α - 1}} \\ (A37) & = \frac{{\hat{P}}_{x}^{α}}{P_{x}^{α - 1}} + \sum_{x^{'} \neq x} \frac{{\hat{P}}_{x^{'}}^{α}}{P_{x^{'}}^{α - 1}} \\ (A38) & = {\hat{P}}_{x} ξ^{1 - α} + (1 - {\hat{P}}_{x}) ψ^{1 - α}, \\ (A39) & 1 & = \sum_{x^{'}} P_{x^{'}} \\ (A40) & = {\hat{P}}_{x} ξ + (1 - {\hat{P}}_{x}) ψ . \end{matrix}

Define

ρ = {\hat{P}}_{x}

. Then, (A40) implies

ψ = \frac{1 - ρ ξ}{1 - ρ}

, and the condition

ψ \geq 0

is equivalent to

ξ \leq ρ^{- 1}

. Substituting this into (A38) shows that we find

ξ

by solving

F (ρ, ξ) = 0

for

ξ \in (0, {(1 - ρ)}^{- 1})

. Since

F (ρ, 1) = 1 - C < 0

and F is strictly convex in

ξ

, there exists, at most, one solution in

(0, 1]

and, at most, one in

[1, {(1 - ρ)}^{- 1})

. It follows that (A33) has, at most, two stationary points, which must correspond to the minimal and maximal value of

P_{x}

. If the solution in

(0, 1]

exists, it is equal to

ξ_{-} (ρ)

, and this stationary point of (A33) corresponds to the minimal value of

P_{x}

, which is then equal to

{\hat{P}}_{x} ξ_{-} ({\hat{P}}_{x})

. If the solution in

(0, 1]

does not exist, then the minimal value of

P_{x}

is not attained on the boundary and is equal to 0, which then is also equal to

{\hat{P}}_{x} ξ_{-} ({\hat{P}}_{x})

. Either way, we find

min_{P \in F} P_{x} = {\hat{P}}_{x} ξ_{-} ({\hat{P}}_{x}) .

(A41)

The proof for the maximal value of

P_{x}

is analogous. □

Lemma A6.

For

X_{1} \subset X

define

{\hat{P}}_{X_{1}} : = \sum_{x \in X_{1}} {\hat{P}}_{x}

. Then,

sup_{P \in F} | | P - \hat{P} {| |}_{1} = 2 max_{\begin{matrix} X_{1} \subset X : \\ X_{1} \neq ⌀ \end{matrix}} {\hat{P}}_{X_{1}} (ξ_{+} ({\hat{P}}_{X_{1}}) - 1) .

(A42)

Proof.

For a given P, define

X_{1} = {x \in X : P_{x} \geq {\hat{P}}_{x}}

and

X_{2} = {x \in X : P_{x} < {\hat{P}}_{x}}

. To find the maximal value of

| | P - \hat{P} {| |}_{1}

, we first maximize it for a given partition

X_{1}, X_{2}

of

X

, and then we maximize over all partitions. Note that

X_{1} = ⌀

is impossible, and for

X_{1} = X

, we have

P = \hat{P}

, which is certainly not optimal. Given

X_{1}, X_{2}

, one has

| | P - \hat{P} {| |}_{1} = \sum_{x \in X_{1}} (P_{x} - {\hat{P}}_{x}) + \sum_{x \in X_{2}} ({\hat{P}}_{x} - P_{x}) .

(A43)

As before, the P maximizing this lies either on the boundary of the probability simplex or it satisfies (A32). For the latter case, we have the Lagrangian expression

\sum_{x \in X_{1}} (P_{x} - {\hat{P}}_{x}) + \sum_{x \in X_{2}} ({\hat{P}}_{x} - P_{x}) + λ (\sum_{x} \frac{{\hat{P}}_{x}^{α}}{P_{x}^{α - 1}} - C) + μ (\sum_{x} P_{x} - 1) = 0 .

(A44)

Taking derivatives, we find, analogously to (A34)–(A35), that there exist

ξ, ψ

such that

P_{x} = ξ {\hat{P}}_{x}

for all

x \in X_{1}

and

P_{x} = ψ {\hat{P}}_{x}

for all

x \in X_{2}

. By definition of

X_{1}

and

X_{2}

, we have

ξ \geq 1

and

0 \leq ψ < 1

. Analogously to (A36)–(A40), these have to satisfy

\begin{matrix} (A45) & {\hat{P}}_{X_{1}} ξ^{1 - α} + (1 - {\hat{P}}_{X_{1}}) ψ^{1 - α} & = C, \\ (A46) & {\hat{P}}_{X_{1}} ξ + (1 - {\hat{P}}_{X_{1}}) ψ & = 1 . \end{matrix}

From this point onward, this proof is analogous to that of Lemma A5. Let

ρ = {\hat{P}}_{X_{1}}

. Expressing

ψ

in terms of

ξ

and substituting this means that to find

ξ

we have to solve

F (ρ, ξ) = 0

for

ξ \in [1, {(1 - ρ)}^{- 1})

, where F is as in the proof of Lemma A5. As before, at most, one such solution exists, and when it does, it corresponds to the maximal value of

| | P - \hat{P} {| |}_{1}

(given

X_{1}

). If it does not exist, then the maximal value of

| | P - \hat{P} {| |}_{1}

is obtained at the boundary where

{\hat{P}}_{X_{1}} = 1

. Either way the maximum is obtained when

ξ = ξ_{+} (ρ)

, which means that

\begin{matrix} (A47) & | | P - \hat{P} {| |}_{1} & = \sum_{x \in X_{1}} (P_{x} - {\hat{P}}_{x}) + \sum_{x \in X_{2}} ({\hat{P}}_{x} - P_{x}) \\ (A48) & = \sum_{x \in X_{1}} {\hat{P}}_{x} (ξ_{+} (ρ) - 1) + \sum_{x \in X_{2}} {\hat{P}}_{x} (1 - \frac{1 - ρ ξ_{+} (ρ)}{1 - ρ}) \\ (A49) & = ρ (ξ_{+} (ρ) - 1) + (1 - ρ) (1 - \frac{1 - ρ ξ_{+} (ρ)}{1 - ρ}) \\ (A50) & = 2 ρ (ξ_{+} (ρ) - 1) . \end{matrix}

This is the maximal value of

| | P - \hat{P} {| |}_{1}

given

X_{1}

; we now find the overall maximum by maximizing over all non-empty

X_{1}

. □

Proof of Proposition 1.

In Lemmas A5 and A6, take

U

instead of

X

,

{\hat{P}}_{U | s}

instead of

\hat{P}

, and

B_{s}

instead of B. Then, by Theorem 1, the role of

F

is taken by

F_{U | s}

. Thus, applying Lemmas A5 and A6 gives us Proposition 1 directly. □

Appendix A.3. Proof of Lemma 1 and Proposition 2

As in the proof of Proposition 1, since by Theorem 1 the projected set

F_{U | s}

is defined by a Rényi divergence as is

F

, it suffices to prove the analogous statements about

F

rather than

F_{U | s}

. Concretely, we prove the following:

Lemma A7.

Suppose

α = 2

and define

\tilde{B} = e^{B} - 1

; let

ξ_{\pm}

be as in Lemma A5. Then,

\begin{matrix} ξ_{\pm} (ρ) & = \frac{\tilde{B} + 2 ρ \pm \sqrt{{\tilde{B}}^{2} + 4 ρ \tilde{B} - 4 \tilde{B} ρ^{2}}}{2 ρ (\tilde{B} + 1)} . \end{matrix}

(A51)

Furthermore, the following hold:

1.: Let $x_{min} = arg {min}_{x \in X} {\hat{P}}_{x}$ . If $\tilde{B} \geq {(1 - {\hat{P}}_{x_{min}})}^{2}$ , then the maximum in (A6) is attained at $X_{1} = {x_{min}}$ .
2.: If $\tilde{B} < {(1 - {\hat{P}}_{x_{min}})}^{2}$ one has ${sup}_{P \in F} | | P - \hat{P} {| |}_{1} \leq \sqrt{\tilde{B}}$ .

The formulas here look slightly different from those in Lemma 1 and Proposition 2. We use this form because it makes the proof more convenient: replacing

\tilde{B}

with

e^{B} - 1

throughout yields exactly the results of Lemma 1 and Proposition 2 for

F

instead of

F_{U | s}

.

Proof.

Consider the function

F (ρ, ξ)

from (A31) for

α = 2

and

C = \tilde{B} + 1

, i.e.,

F (ρ, ξ) = \frac{ρ}{ξ} + \frac{{(1 - ρ)}^{2}}{1 - ρ ξ} - \tilde{B} - 1 .

(A52)

Then,

F (ρ, ξ) = 0

can be rewritten to a quadratic equation in

ξ

. Its two roots are

ξ_{\pm} (ρ)

, and with some rewriting they can be expressed as in (31). For points 1 and 2, we note that

2 ρ (ξ_{+} (ρ) - 1) = \frac{\tilde{B} - 2 \tilde{B} ρ + \sqrt{{\tilde{B}}^{2} + 4 \tilde{B} ρ - 4 \tilde{B} ρ^{2}}}{\tilde{B} + 1} .

(A53)

We can find its extremal values with respect to

ρ

by taking the derivative and setting it to 0, i.e., by solving

- \frac{2 \tilde{B}}{\tilde{B} + 1} + \frac{2 \tilde{B} - 4 \tilde{B} ρ}{(\tilde{B} + 1) \sqrt{{\tilde{B}}^{2} + 4 \tilde{B} ρ - 4 \tilde{B} ρ^{2}}} = 0,

(A54)

which has a single solution

ρ_{opt} = \frac{1 - \sqrt{\tilde{B}}}{2}

. Since (A53) is concave in

ρ

, this means that this unique extremal value is a maximum. If

\tilde{B} \geq {(1 - 2 {\hat{P}}_{x_{min}})}^{2}

, then

ρ_{opt} \geq {\hat{P}}_{x_{min}}

, and

ρ (ξ_{+} (ρ) - 1)

is decreasing in

ρ

on

[{\hat{P}}_{x_{min}}, 1]

. Since all possible values of

{\hat{P}}_{X_{1}}

lie in this interval, it is optimal to take

X_{1}

such that

{\hat{P}}_{X_{1}}

is minimized, i.e.,

X_{1} = {x_{min}}

; this proves point 1. For point 2 we have (and also for general B)

\begin{matrix} (A55) & sup_{P \in F} | | P - \hat{P} {| |}_{1} & = 2 max_{\begin{matrix} X_{1} \subset X : \\ X_{1} \neq ⌀ \end{matrix}} {\hat{P}}_{X_{1}} (ξ_{+} ({\hat{P}}_{X_{1}}) - 1) \\ (A56) & \leq 2 ρ_{opt} (ξ_{+} (ρ_{opt}) - 1) \\ (A57) & = \frac{\tilde{B} - 2 \tilde{B} \frac{1 - \sqrt{\tilde{B}}}{2} + \sqrt{{\tilde{B}}^{2} + 4 \tilde{B} \frac{1 - \sqrt{\tilde{B}}}{2} - 4 \tilde{B} \frac{{(1 - \sqrt{\tilde{B}})}^{2}}{4}}}{\tilde{B} + 1} \\ (A58) & = \sqrt{\tilde{B}} . \end{matrix}

□

Appendix A.4. Proof of Theorem 2

Let

D = \prod_{s \in S} D_{s} \subset R^{X}

. Thus, an element

t \in D

is of the form

t = {(t_{s, u})}_{(s, u) \in X}

, and for any s, we have

{(t_{s, u})}_{u \in U} \in D_{s}

. For

s_{1}, s_{2} \in S

, let

B^{s_{1}, s_{2}} \in R^{X \times X}

be the matrix given by

B_{(s, u); (s^{'}, u^{'})}^{s_{1}, s_{2}} = \{\begin{matrix} 1, & if u = u^{'} and s = s^{'} = s_{1}, \\ - e^{ε}, & if u = u^{'} and s = s^{'} = s_{2}, \\ 0, & otherwise . \end{matrix}

(A59)

Then, we can rewrite (41) as

\forall y, s_{1}, s_{2} : max_{t \in D} {({(B^{s_{1}, s_{2}})}^{T} Q_{y})}^{T} t \leq 0 .

(A60)

Recall that for each s, we have

D_{s} = {R \in P_{U} : R_{u} \geq L_{u | s}}

. Since

D = \prod_{s} D_{s}

, we can write

\begin{matrix} (A61) & \begin{matrix} D & = \{t \in R^{X} : \forall s, u : t_{s, u} \geq L_{u | s}, \forall s : \sum_{u} t_{s, u} = 1\} \end{matrix} \\ (A62) & \begin{matrix} = {t \in R^{X} : Φ t + ϕ \geq 0, Ψ t + ψ = 0}, \end{matrix} \end{matrix}

where

Φ \in R^{X \times X}

,

ϕ \in R^{X}

,

Ψ \in R^{S \times X}

and

ψ \in R^{S}

are given, for

s, s^{'} \in S

and

u \in U

, by

\begin{matrix} (A63) & Φ & = {id}_{X}, \\ (A64) & ϕ_{s, u} & = - L_{u | s}, \\ (A65) & Ψ_{s^{'}; (s, u)} & = \{\begin{matrix} 1, & if s = s^{'}, \\ 0, & otherwise, \end{matrix} \\ (A66) & ψ_{s} & = - 1 . \end{matrix}

Combining this with (A60), we find that

Q

satisfies

(ε, F)

-RLDP whenever

\forall y, s_{1}, s_{2} : max_{\begin{matrix} t \in R^{X} : \\ Φ t + ϕ \geq 0, \\ Ψ t + ψ = 0 \end{matrix}} {({(B^{s_{1}, s_{2}})}^{T} Q_{y})}^{T} t \leq 0 .

(A67)

Now fix

y, s_{1}, s_{2}

, and consider the linear programming problem that forms the LHS of (A67). From the duality of linear programming, we know

max_{\begin{matrix} t \in R^{X} : \\ Φ t + ϕ \geq 0, \\ Ψ t + ψ = 0 \end{matrix}} {({(B^{s_{1}, s_{2}})}^{T} Q_{y})}^{T} t = min_{\begin{matrix} z \in R^{X}, w \in R^{S} : \\ Φ^{T} z + Ψ^{T} w = - {(B^{s_{1}, s_{2}})}^{T} Q_{y}, \\ z \geq 0 \end{matrix}} ϕ^{T} z + ψ^{T} w .

(A68)

We focus on the linear programming problem of the RHS. The terms of this problem are given by

\begin{matrix} (A69) & Φ^{T} z & = z, \\ (A70) & {(Ψ^{T} w)}_{s, u} & = w_{s}, \\ (A71) & {({(B^{s_{1}, s_{2}})}^{T} Q_{y})}_{s, u} & = \{\begin{matrix} Q_{y | s_{1}, u}, & if s = s_{1}, \\ - e^{ε} Q_{y | s_{2}, u}, & if s = s_{2}, \\ 0, & otherwise, \end{matrix} \\ (A72) & ϕ^{T} z & = - \sum_{s, u} L_{u | s} z_{s, u}, \\ (A73) & ψ^{T} w & = - \sum_{s} w_{s} . \end{matrix}

The equation

Φ^{T} z + Ψ^{T} w = - {(B^{s_{1}, s_{2}})}^{T} Q_{y}

can now be rewritten as

z_{s, u} = \{\begin{matrix} - Q_{y | s_{1}, u} - w_{s_{1}}, & if s = s_{1}, \\ e^{ε} Q_{y | s_{2}, u} - w_{s_{2}}, & if s = s_{2}, \\ - w_{s} & otherwise . \end{matrix}

(A74)

Thus, the restriction

z \geq 0

translates to

\begin{matrix} w_{s_{1}} & \leq - max_{u \in U} Q_{y | s_{1}, u}, \\ w_{s_{2}} & \leq e^{ε} min_{u \in U} Q_{y | s_{2}, u}, \\ \forall s \neq s_{1}, s_{2} : w_{s} & \leq 0 . \end{matrix}

Furthermore, the objective function

ϕ^{T} z + ψ^{T} w

becomes

- \sum_{s} (1 - \sum_{u} L_{u | s}) w_{s} + \sum_{u} Q_{y | s_{1}, u} L_{u | s_{1}} - e^{ε} \sum_{u} Q_{y | s_{2}, u} L_{u | s_{2}} .

(A75)

Combining this with (A67) and (A68), we see that a sufficient condition for

Q

to be

(ε, F)

-RLDP is if there exists a

w \in R^{S}

such that

\begin{matrix} (A76) & - \sum_{s} (1 - \sum_{u} L_{u | s}) w_{s} + \sum_{u} Q_{y | s_{1}, u} L_{u | s_{1}} - e^{ε} \sum_{u} Q_{y | s_{2}, u} L_{u | s_{2}} & \leq 0, \\ (A77) & w_{s_{1}} & \leq - max_{u \in U} Q_{y | s_{1}, u}, \\ (A78) & w_{s_{2}} & \leq e^{ε} min_{u \in U} Q_{y | s_{2}, u}, \\ (A79) & \forall s \neq s_{1}, s_{2} : w_{s} & \leq 0 . \end{matrix}

Since

\sum_{u} L_{u | s} \leq 1

for all s, it follows that the left-hand side of (A76) is minimal if each

w_{s}

attains its maximal value, subject to the constraints (A77)–(A79). Substituting this, we find that the minimum of the left-hand side is equal to

\begin{matrix} (A80) & \begin{matrix} (1 - \sum_{u} L_{u | s_{1}}) (max_{u_{1}} Q_{y | u_{1}, s_{1}}) - e^{ε} (1 - \sum_{u} L_{u | s_{2}}) (min_{u_{2}} Q_{y | u_{2}, s_{2}}) \\ + \sum_{u} Q_{y | s_{1}, u} L_{u | s_{1}} - e^{ε} \sum_{u} Q_{y | s_{2}, u} L_{u | s_{2}} \end{matrix} \\ (A81) & \begin{matrix} = max_{u_{1}, u_{2} \in U} [(1 - \sum_{u} L_{u | s_{1}}) Q_{y | u_{1}, s_{1}} - e^{ε} (1 - \sum_{u} L_{u | s_{2}}) Q_{y | u_{2}, s_{2}}] \\ + \sum_{u} Q_{y | s_{1}, u} L_{u | s_{1}} - e^{ε} \sum_{u} Q_{y | s_{2}, u} L_{u | s_{2}} \end{matrix} \\ (A82) & \begin{matrix} = max_{u_{1}, u_{2} \in U} [Q_{y | u_{1}, s_{1}} - e^{ε} Q_{y | u_{2}, s_{2}} + \sum_{u} L_{u | s_{1}} (Q_{y | s_{1}, u} - Q_{y | s_{1}, u_{1}}) - e^{ε} \sum_{u} L_{u | s_{2}} (Q_{y | s_{2}, u} - Q_{y | s_{2}, u_{2}})] . \end{matrix} \end{matrix}

This has to be nonpositive for all choices of

u_{1}, u_{2}, s_{1}, s_{2}, y

; but this is true precisely if

Q_{y} \in Γ_{L, ε}

for all y.

Appendix A.5. Proof of Theorem 3

This is essentially analogous to the proof of Theorem 4 in [5]; the main difference is that the equivalent of

\hat{Γ}

is a hypercube, for which a vertex enumeration step is not needed. Let

Q

be a mechanism such that

Q_{y} \in Γ

for all y; then there exist

α_{y} \in R_{\geq 0}

,

γ_{y} \in \hat{Γ}

such that

Q_{y} = α_{y} γ_{y}

. One has

I_{\hat{P}} (X; Y) = \sum_{y} μ (Q_{y}) = \sum_{y} α_{y} μ (γ_{y}) .

(A83)

Since

\hat{Γ}

is the convex hull of

V

, we can write

γ_{y} = \sum_{v} λ_{y, v} v

for suitable constants

λ_{y, v}

. Define

θ \in R_{\geq 0}^{V}

by

θ_{v} = \sum_{y} λ_{y, v} α_{y}

. Then,

\sum_{v} θ_{v} v = \sum_{y} Q_{y} = 1_{X} .

(A84)

As such, the matrix

Q^{'} \in R^{V \times X}

defined by

Q_{v}^{'} = θ_{v} v

defines a privacy mechanism

Q^{'}

. One has

\begin{matrix} (A85) & I_{\hat{P}} (X; Q^{'} (X)) & = \sum_{v} μ (Q_{v}^{'}) \\ (A86) & = \sum_{v} θ_{v} μ (v) \\ (A87) & = \sum_{y} α_{y} \sum_{v} λ_{y, v} μ (v) \\ (A88) & \geq \sum_{y} α_{y} μ (\sum_{v} λ_{y, v} v) \\ (A89) & = I_{\hat{P}} (X; Q (X)), \end{matrix}

where we use the fact that

μ

is convex. This shows that the

Q_{y}

of the optimal mechanism satisfying Theorem 2 are all of the form

θ_{v} \cdot v

; hence, (46) yields the optimal mechanism. To see that

| Y | \leq a

, observe that the polyhedron described in (46) is defined by a equality constraints, and

| V |

inequality constraints of the form

θ_{v} \geq 0

. Hence, any vertex of this polyhedron has at most a nonzero coefficients. Since the optimal mechanism corresponds to such a vertex, and its output space

Y

corresponds to its nonzero coefficients, we conclude that

| Y | \leq a

. □

Appendix A.6. Proof of Theorem 4

We follow the proof of Theorem 14 in [5]; however, we first need the following auxiliary lemma.

Lemma A8.

Let

ε > 0

, and let

C \subset R_{\geq 0}^{X}

be the positive cone defined by

C = {C \in R_{\geq 0}^{X} : C_{s, u} \leq e^{ε} C_{s^{'}, u^{'}} for all s \neq s^{'} \in S, u \in U} .

(A90)

Define the sets

V_{1}, V_{2}, V \subset R_{\geq 0}^{X}

by

\begin{matrix} (A91) & \begin{matrix} V_{1} & = \{v \in R_{\geq 0}^{X} : \exists s s . t . \begin{matrix} \forall u : v_{s, u} \in {e^{- ε}, e^{ε}}; \\ \forall s^{'} \neq s, \forall u : v_{s^{'}, u} = 1 \end{matrix}\}, \end{matrix} \\ (A92) & \begin{matrix} V_{2} & = \{v \in R_{\geq 0}^{X} : \begin{matrix} \forall x : v_{x} \in {1, e^{ε}}, \\ | {s : \exists u s . t . v_{s, u} = e^{ε}} | \geq 2 \end{matrix}\}, \end{matrix} \\ (A93) & \begin{matrix} V & = V_{1} \cup V_{2} . \end{matrix} \end{matrix}

Then

V

spans

C

as a positive cone, i.e.,

C = \{\sum_{v \in V} θ_{v} v : θ \in R_{\geq 0}^{V}\} .

(A94)

Proof.

For every

s \in S

and

u, u^{'} \in U

, we have

C_{s, u} \leq e^{ε} C_{s^{'}, u} \leq e^{2 ε} C_{s, u^{'}},

(A95)

where

s^{'} \in S ∖ {s}

is arbitrary. Thus, in every

C \in C

two coefficients can differ by at most a factor

e^{ε}

if they have different s, and at most a factor

e^{2 ε}

if they have the same s. On the extremal rays of

C

, the inequalities become equalities. By rescaling by a positive scalar, if necessary, we see that

C

is spanned by vectors of which each coefficient is in the set

{e^{- ε}, 1, e^{ε}}

. In other words, if

V^{'} = {e^{- ε}, 1, e^{ε}}^{X} \cap C

, then

C = Span (V^{'}),

(A96)

where

Span

refers to the span as in (A94). To determine

V^{'}

we consider two situations: either v contains both

e^{- ε}

and

e^{ε}

as coefficients, or not.

Suppose v contains

e^{- ε}

and

e^{ε}

, say

v_{s, u} = e^{- ε}

and

v_{s^{'}, u^{'}} = e^{ε}

. By (A90), we must have

s = s^{'}

, and by (A95), this means that

v_{s^{″}, u^{″}} = 1

for

s^{″} \neq s

and any

u^{″}

. Thus, we define, for any

s \in S

, the set

V_{s}^{'} = \{v \in R_{\geq}^{X} : \forall s^{'} \neq s \forall u : v_{s^{'}, u} = 1\} .

(A97)

It is straightforward to show that

V_{s}^{'} \subset C

, and by the discussion above any

v \in V^{'}

containing both

e^{- ε}

and

e^{ε}

is in

⋃_{s} V_{s}^{'}

.

Suppose v does not contain both

e^{- ε}

and

e^{ε}

, then

v \in V_{2}^{'} \cup V_{3}^{'}

where

\begin{matrix} V_{2}^{'} & = {1, e^{ε}}^{X}, \end{matrix}

(A98)

\begin{matrix} V_{3}^{'} & = {e^{- ε}, 1}^{X} . \end{matrix}

(A99)

Furthermore, it is easy to see that

V_{2}^{'} \cup V_{3}^{'} \subset V^{'}

. Thus, we conclude that

V^{'} = (⋃_{s \in S} V_{s}^{'}) \cup V_{2}^{'} \cup V_{3}^{'} .

(A100)

To obtain from

V^{'}

to

V

, we throw out some vectors that are not needed to span

C

. We start with

V_{s}^{'}

. Given s, define the set

V_{s} = \{v \in R_{\geq 0}^{X} : \begin{matrix} \forall u : v_{s, u} \in {e^{- ε}, e^{ε}}; \\ \forall s^{'} \neq s, \forall u : v_{s^{'}, u} = 1 \end{matrix}\} .

(A101)

It is clear that

V_{s} \subset V_{s}^{'}

; we claim that

Span (V_{s}) = Span (V_{s}^{'}) .

(A102)

To see this, let

v \in V_{s}^{'} ∖ V_{s}

, and define

v^{-}, v^{+} \in R_{\geq 0}^{X}

by

\begin{matrix} v_{s^{'}, u}^{+} & = \{\begin{matrix} e^{- ε}, & if s^{'} = s and v_{s^{'}, u} = e^{- ε}, \\ 1, & if s^{'} \neq s, \\ e^{ε}, & if s^{'} = s and v_{s^{'}, u} \in {1, e^{ε}}, \end{matrix} \end{matrix}

(A103)

\begin{matrix} v_{s^{'}, u}^{-} & = \{\begin{matrix} e^{- ε}, & if s^{'} = s and v_{s^{'}, u} \in {e^{- ε}, 1}, \\ 1, & if s^{'} \neq s, \\ e^{ε}, & if s^{'} = s and v_{s^{'}, u} = e^{ε} . \end{matrix} \end{matrix}

(A104)

In other words,

v^{\pm}

takes all s-coefficients of v that are equal to 1 and changes them to

e^{\pm ε}

. Then,

v^{+}, v^{-} \in V_{s}^{'}

and

v = \frac{1}{e^{ε} + 1} v^{+} + \frac{e^{ε}}{e^{ε} + 1} v^{-} .

(A105)

Thus,

v \in Span (V_{s}^{'})

, proving (A102). We now consider

V_{2}^{'}

and

V_{3}^{'}

. First note that

V_{3}^{'} = e^{- ε} V_{2}^{'}

, so

Span (V_{2}^{'}) = Span (V_{3}^{'}) .

(A106)

We furthermore claim that

Span (V_{2} \cup ⋃_{s \in S} V_{s}) = Span (V_{2}^{'} \cup ⋃_{s \in S} V_{s}),

(A107)

where

V_{2}

is as in (A92). Note that clearly

V_{2} \subset V_{2}^{'}

. To see (A107), let

v \in V_{2}^{'} ∖ V_{2}

; this means that there is at most a single

(s, u)

such that

v_{s, u} = e^{ε}

. If no such

(s, u)

exists, then

v = 1_{X}

, the constant vector with all ones. This implies that

e^{ε} v \in V_{2}

, showing that

v \in Span (V_{2})

. Now suppose that there is exactly one

(s, u)

such that

v_{s, u} = e^{ε}

. Then,

v_{s^{'}, u^{'}} = \{\begin{matrix} e^{ε}, & if s = s^{'} and u = u^{'}, \\ 1, & otherwise . \end{matrix}

(A108)

But then we can construct

v^{+}

as in (A103) and

v^{-}

as in (A104), and again we find

v = \frac{1}{e^{ε} + 1} v^{+} + \frac{e^{ε}}{e^{ε} + 1} v^{-} \in Span (V_{s}) .

(A109)

This proves (A107). Combining (A102), (A106) and (A107) we obtain

\begin{matrix} (A110) & C & = Span (V_{2}^{'} \cup V_{3}^{'} \cup ⋃_{s \in S} V_{s}^{'}) \\ (A111) & = Span (V_{2} \cup ⋃_{s \in S} V_{s}) \\ (A112) & = Span (V_{2} \cup V_{1}) . \end{matrix}

□

Proof of Theorem 4.

We follow the proof of Theorem 14 in [5]. For

C \in R_{\geq 0}^{X}

, define

μ (C) = \sum_{x} P_{x} C_{x} log \frac{C_{x}}{\sum_{x^{'}} P_{x^{'}} C_{x^{'}}} .

(A113)

For

y \in Y

, let

Q_{y} = {(Q_{y | x})}_{x} \in R^{X}

; then the utility of a mechanism

Q : X \to Y

is given by

I_{P} (X; Y) = \sum_{y} μ (Q_{y})

. Furthermore,

μ

is a sublinear function in the sense of Definition 1 of [5].

We fix an

ε > 0

. Furthermore, let

C \subset R_{\geq 0}^{X}

be as in Lemma A8. Then, a mechanism

Q

satisfies

(ε, P_{X})

-RLDP if and only if each

Q_{y}

is an element of

C

. Let

V

be the spanning set of

V

of Lemma A8, and let

D

be the polytope spanned by

V

. If

Q

satisfies

ε

-SLDP, then every column

Q_{y}

is of the form

θ_{y} \cdot d_{y}

, where

d_{y} \in D

and

θ_{y} \in R_{\geq 0}

are such that

\sum_{y} θ_{y} d_{y} = 1_{X}

. Analogously to the proof of Theorems 2 and 4 in Section 7 of [5] (or, for that matter, our proof of Theorem 3), one proves that the optimal

Q

is found by taking

b = a

, and taking

d_{y} \in V

for all d. Since

I (X; Y) = \sum_{y} μ (Q_{y}) = \sum_{y} θ_{y} μ (d_{y})

(A114)

we can find the optimal

Q

by solving the following optimization problem, where

m \in R^{V}

is the vector

{(μ (v))}_{v \in V}

, and where

A \in R^{X \times V}

is the matrix whose v-th column is v:

\begin{matrix} {maximize}_{θ \in R^{V}} & m \cdot θ \\ such that & A \cdot θ = 1_{X}, \\ θ \geq 0 . \end{matrix}

From here, we follow Section 9.5 of [5]. The dual to the above problem is

\begin{matrix} {minimize}_{α \in R^{X}} & (1_{X}) \cdot α \\ such that & A^{T} \cdot α \geq m, \\ α \geq 0 . \end{matrix}

By duality, we have

{max}_{θ} m \cdot θ = {min}_{α} (1_{X}) \cdot α

. We describe

α^{*} \geq 0

and

θ^{*} \geq 0

, depending on

ε

, such that for sufficiently large

ε

one has

A^{T} \cdot α^{*} \geq m

, such that

m \cdot θ^{*} = (1_{X}) \cdot α^{*}

and

A θ^{*} = 1_{X}

, and such that

θ^{*}

corresponds to SRR, i.e., for each

y \in Y = X

there is a

{\hat{v}}_{y} \in V

such that

{SRR}_{y}^{ε} = θ_{{\hat{v}}_{y}}^{*} {\hat{v}}_{y}

. Together, this proves that SRR is optimal for

ε ≫ 0

.

More concretely, for

y = (s, u) \in X

, define

{\hat{v}}_{y}

by

{({\hat{v}}_{y})}_{s^{'}, u^{'}} = \{\begin{matrix} e^{ε}, & if (s^{'}, u^{'}) = (s, u), \\ e^{- ε}, & if s^{'} = s and u^{'} \neq u, \\ 1, & if s^{'} \neq s . \end{matrix}

(A115)

Note that

{\hat{v}}_{y} \in V

. Furthermore, let

θ^{*} \in R^{V}

be given by

θ_{v}^{*} = \{\begin{matrix} \frac{1}{e^{ε} + e^{- ε} (a_{2} - 1) + a - a_{2}}, & if there is a y \in X such that v = {\hat{v}}_{y}, \\ 0, & otherwise; \end{matrix}

(A116)

Then, SRR satisfies

{SRR}_{y}^{ε} = θ_{{\hat{v}}_{y}}^{*} {\hat{v}}_{y}

for all

y \in X

, and for each

x \in X

one has

\begin{matrix} (A117) & {(A θ^{*})}_{x} & = \sum_{v} A_{x, v} θ_{v}^{*} \\ (A118) & = \sum_{v} v_{x} θ_{v}^{*} \\ (A119) & = \frac{\sum_{y} {({\hat{v}}_{y})}_{x}}{e^{ε} + e^{- ε} (a_{2} - 1) + a - a_{2}} \\ (A120) & = 1, \end{matrix}

which shows that

A θ^{*} = 1_{X}

. Furthermore, define

α^{*} \in R^{X}

by

α_{s, u}^{*} = c_{1} μ ({\hat{v}}_{s, u}) + c_{2} \sum_{u^{'} \neq u} μ ({\hat{v}}_{s, u^{'}}) + c_{3} \sum_{\begin{matrix} s^{'} \neq s, \\ u^{'} \end{matrix}} μ ({\hat{v}}_{s^{'}, u^{'}}),

(A121)

where

\begin{matrix} (A122) & \begin{matrix} c_{1} & = \frac{- (a_{2} - 2) (a_{2} - 1) + (a - a_{2} + 1) (a_{2} - 2) e^{ε} + (a - 2 a_{2} + 1) e^{2 ε} + e^{3 ε}}{(e^{ε} - 1) (e^{ε} + 1) (e^{ε} - a_{2} + 1) (e^{ε} + (a_{2} - 1) e^{- ε} + a - a_{2})}, \end{matrix} \\ (A123) & \begin{matrix} c_{2} & = \frac{a_{2} - 1 + (a - a_{2} + 1) e^{ε}}{(e^{ε} - 1) (e^{ε} + 1) (e^{ε} - a_{2} + 1) (e^{ε} + (a_{2} - 1) e^{- ε} + a - a_{2})}, \end{matrix} \\ (A124) & \begin{matrix} c_{3} & = \frac{- e^{2 ε}}{(e^{ε} - 1) (e^{ε} - a_{2} + 1) (e^{ε} + (a_{2} - 1) e^{- ε} + a - a_{2})} . \end{matrix} \end{matrix}

A cumbersome but straightforward calculation shows that for all x, we have

\begin{matrix} (A125) & m \cdot θ^{*} = (1_{X}) \cdot α^{*} & = \frac{1}{e^{ε} + e^{- ε} (a_{2} - 1) + a - a_{2}} \sum_{x} μ ({\hat{v}}_{x}), \\ (A126) & {\hat{v}}_{x} \cdot α^{*} = m_{{\hat{v}}_{x}} & = μ ({\hat{v}}_{x}) . \end{matrix}

Furthermore,

c_{1}, c_{2}, c_{3} \geq 0

, so

α^{*} \geq 0

. It remains to be shown that

α^{*}

satisfies the dual problem for

ε ≫ 0

, i.e.,

A^{T} α \geq m

for sufficiently large

ε

. To this end, for

v \in V

, set

\begin{matrix} F_{v} & = {x \in X : v_{x} = e^{ε}}, \end{matrix}

(A127)

\begin{matrix} G_{v} & = {x \in X : v_{x} = 1}, \end{matrix}

(A128)

\begin{matrix} H_{v} & = {x \in X : v_{x} = e^{- ε}}, \end{matrix}

(A129)

From the description of

V

in Lemma A8, we find that

| F_{v} | \geq 1

for all v, and

| F_{v} | = 1

if and only if there exist

s, u

such that

v = {\hat{v}}_{s, u}

. Now, write

P_{F_{v}} = \sum_{x \in F_{v}} P_{x}

and likewise for

G_{v}

,

H_{v}

. For large

ε

, we have

\begin{matrix} (A130) & m_{v} = μ (v) & = e^{ε} \sum_{x \in F_{v}} P_{x} log \frac{1}{P_{F_{v}} + e^{- ε} P_{G_{v}} + e^{- 2 ε} P_{H_{v}}} \\ (A131) & + \sum_{x \in G_{v}} P_{x} log \frac{1}{e^{ε} P_{F_{v}} + P_{G_{v}} + e^{- ε} P_{H_{v}}} \\ (A132) & + e^{- ε} \sum_{x \in H_{x}} P_{x} log \frac{1}{e^{2 ε} P_{F_{v}} + e^{ε} P_{G_{v}} + P_{H_{v}}} \\ (A133) & = (- P_{F_{v}} log P_{F_{v}}) e^{ε} + O (ε) \end{matrix}

and furthermore

\begin{matrix} (A134) & c_{1} & = e^{- ε} + O (e^{- 2 ε}), \\ (A135) & c_{2}, c_{3} & = O (e^{- 2 ε}) . \end{matrix}

From this, it follows that

\begin{matrix} (A136) & \begin{matrix} α_{x}^{*} & = c_{1} μ ({\hat{v}}_{x}) + (c_{2} + c_{3}) O (e^{ε}) \end{matrix} \\ (A137) & \begin{matrix} = - P_{x} log P_{x} + O (ε e^{- ε}), \end{matrix} \end{matrix}

Hence,

\begin{matrix} v^{T} α^{*} & = (- \sum_{x \in F_{v}} P_{x} log P_{x}) e^{ε} + O (ε) . \end{matrix}

(A138)

For

| F_{v} | \geq 2

, one has

P_{F_{v}} log P_{F_{v}} > \sum_{x \in F_{v}} P_{x} log P_{x}

. This means that if v is not of the form

{\hat{v}}_{x}

, one has

v^{T} α^{*} \geq m_{v}

for sufficiently large

ε

. Together with (A126), this shows that

A^{T} α^{*} \geq m

for sufficiently large

ε

; this concludes the proof. □

References

Kasiviswanathan, S.P.; Lee, H.K.; Nissim, K.; Raskhodnikova, S.; Smith, A. What can we learn privately? SIAM J. Comput. 2011, 40, 793–826. [Google Scholar] [CrossRef]
Duchi, J.C.; Jordan, M.I.; Wainwright, M.J. Local privacy and statistical minimax rates. In Proceedings of the 2013 IEEE 54th Annual Symposium on Foundations of Computer Science (FOCS), Berkeley, CA, USA, 26–29 October 2013; pp. 429–438. [Google Scholar]
Lopuhaä-Zwakenberg, M.; Tong, H.; Škorić, B. Data Sanitisation for the Privacy Funnel with Differential Privacy Guarantees. Int. J. Adv. Secur. 2020, 13, 162–174. [Google Scholar]
Rebollo-Monedero, D.; Forne, J.; Domingo-Ferrer, J. From t-closeness-like privacy to postrandomization via information theory. IEEE Trans. Knowl. Data Eng. 2010, 22, 1623–1636. [Google Scholar] [CrossRef]
Kairouz, P.; Oh, S.; Viswanath, P. Extremal mechanisms for local differential privacy. J. Mach. Learn. Res. 2016, 17, 492–542. [Google Scholar]
Makhdoumi, A.; Salamatian, S.; Fawaz, N.; Médard, M. From the information bottleneck to the privacy funnel. In Proceedings of the 2014 IEEE Information Theory Workshop (ITW 2014), Hobart, TAS, Australia, 2–5 November 2014; pp. 501–505. [Google Scholar]
Salamatian, S.; Zhang, A.; du Pin Calmon, F.; Bhamidipati, S.; Fawaz, N.; Kveton, B.; Oliveira, P.; Taft, N. Managing your private and public data: Bringing down inference attacks against your privacy. IEEE J. Sel. Top. Signal Process. 2015, 9, 1240–1255. [Google Scholar] [CrossRef]
Asoodeh, S.; Diaz, M.; Alajaji, F.; Linder, T. Information extraction under privacy constraints. Information 2016, 7, 15. [Google Scholar] [CrossRef]
Kung, S. A compressive privacy approach to generalized information bottleneck and privacy funnel problems. J. Frankl. Inst. 2018, 355, 1846–1872. [Google Scholar] [CrossRef]
Ding, N.; Sadeghi, P. A submodularity-based clustering algorithm for the information bottleneck and privacy funnel. In Proceedings of the 2019 IEEE Information Theory Workshop (ITW), Visby, Sweden, 25–28 August 2019; pp. 1–5. [Google Scholar]
Salamatian, S.; Calmon, F.P.; Fawaz, N.; Makhdoumi, A.; Médard, M. Privacy-Utility Tradeoff and Privacy Funnel. 2020. Available online: https://api.semanticscholar.org/CorpusID:210927663 (accessed on 10 January 2024).
Acharya, J.; Bonawitz, K.; Kairouz, P.; Ramage, D.; Sun, Z. Context aware local differential privacy. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 13–18 July 2020; pp. 52–62. [Google Scholar]
Goseling, J.; Lopuhaä-Zwakenberg, M. Robust optimization for local differential privacy. In Proceedings of the 2022 IEEE International Symposium on Information Theory (ISIT), Espoo, Finland, 26 June–1 July 2022; pp. 1629–1634. [Google Scholar]
Lopuhaä-Zwakenberg, M.; Goseling, J. Robust Local Differential Privacy. In Proceedings of the 2021 IEEE International Symposium on Information Theory (ISIT), Melbourne, VIC, Australia, 12–20 July 2021; pp. 557–562. [Google Scholar]
Kifer, D.; Machanavajjhala, A. Pufferfish: A framework for mathematical privacy definitions. ACM Trans. Database Syst. 2014, 39, 1–36. [Google Scholar] [CrossRef]
Ben-Tal, A.; El Ghaoui, L.; Nemirovski, A. Robust Optimization; Princeton University Press: Princeton, NJ, USA, 2009; Volume 28. [Google Scholar]
Ben-Tal, A.; Den Hertog, D.; Vial, J.P. Deriving robust counterparts of nonlinear uncertain inequalities. Math. Program. 2015, 149, 265–299. [Google Scholar] [CrossRef]
Bertsimas, D.; Gupta, V.; Kallus, N. Data-driven robust optimization. Math. Program. 2018, 167, 235–292. [Google Scholar] [CrossRef]
Warner, S.L. Randomized response: A survey technique for eliminating evasive answer bias. J. Am. Stat. Assoc. 1965, 60, 63–69. [Google Scholar] [CrossRef] [PubMed]
Song, S.; Wang, Y.; Chaudhuri, K. Pufferfish privacy mechanisms for correlated data. In Proceedings of the 2017 ACM International Conference on Management of Data, Chicago, IL, USA, 14–19 May 2017; pp. 1291–1306. [Google Scholar]
Nuradha, T.; Goldfeld, Z. Pufferfish Privacy: An Information-Theoretic Study. IEEE Trans. Inf. Theory 2023, 69, 7336–7356. [Google Scholar] [CrossRef]
Yang, B.; Sato, I.; Nakagawa, H. Bayesian differential privacy on correlated data. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, VIC, Australia, 31 May–4 June 2015; pp. 747–762. [Google Scholar]
He, X.; Machanavajjhala, A.; Ding, B. Blowfish privacy: Tuning privacy-utility trade-offs using policies. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, Snowbird, UT, USA, 22–27 June 2014; pp. 1447–1458. [Google Scholar]
Dwork, C.; Roth, A. The algorithmic foundations of differential privacy. Found. Trends® Theor. Comput. Sci. 2014, 9, 211–407. [Google Scholar] [CrossRef]
Wang, T.; Blocki, J.; Li, N.; Jha, S. Locally differentially private protocols for frequency estimation. In Proceedings of the 26th {USENIX} Security Symposium ({USENIX} Security 17), Vancouver, BC, Canada, 16–18 August 2017; pp. 729–745. [Google Scholar]
Tishby, N.; Pereira, F.C.; Bialek, W. The information bottleneck method. arXiv 2000, arXiv:physics/0004057. [Google Scholar]
Wagner, I.; Eckhoff, D. Technical privacy metrics: A systematic survey. ACM Comput. Surv. 2018, 51, 1–38. [Google Scholar] [CrossRef]
Rassouli, B.; Gunduz, D. On perfect privacy. In Proceedings of the 2018 IEEE International Symposium on Information Theory (ISIT), Vail, CO, USA, 17–22 June 2018; pp. 2551–2555. [Google Scholar]
Asoodeh, S.; Diaz, M.; Alajaji, F.; Linder, T. Estimation efficiency under privacy constraints. IEEE Trans. Inf. Theory 2018, 65, 1512–1534. [Google Scholar] [CrossRef]
Wang, H.; Calmon, F.P. An estimation-theoretic view of privacy. In Proceedings of the 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 3–6 October 2017; pp. 886–893. [Google Scholar]
Mironov, I. Rényi differential privacy. In Proceedings of the 2017 IEEE 30th Computer Security Foundations Symposium (CSF), Barbara, CA, USA, 21–25 August 2017; pp. 263–275. [Google Scholar]
Issa, I.; Wagner, A.B.; Kamath, S. An operational approach to information leakage. IEEE Trans. Inf. Theory 2019, 66, 1625–1657. [Google Scholar] [CrossRef]
Liao, J.; Kosut, O.; Sankar, L.; du Pin Calmon, F. Tunable Measures for Information Leakage and Applications to Privacy-Utility Tradeoffs. IEEE Trans. Inf. Theory 2019, 65, 8043–8066. [Google Scholar] [CrossRef]
Saeidian, S.; Cervia, G.; Oechtering, T.J.; Skoglund, M. Pointwise maximal leakage. IEEE Trans. Inf. Theory 2023, 69, 8054–8080. [Google Scholar] [CrossRef]
Diaz, M.; Wang, H.; Calmon, F.P.; Sankar, L. On the robustness of information-theoretic privacy measures and mechanisms. IEEE Trans. Inf. Theory 2019, 66, 1949–1978. [Google Scholar] [CrossRef]
Makhdoumi, A.; Fawaz, N. Privacy-utility tradeoff under statistical uncertainty. In Proceedings of the 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 2–4 October 2013; pp. 1627–1634. [Google Scholar]
Kalantari, K.; Sankar, L.; Sarwate, A.D. Robust privacy-utility tradeoffs under differential privacy and hamming distortion. IEEE Trans. Inf. Forensics Secur. 2018, 13, 2816–2830. [Google Scholar] [CrossRef]
Asoodeh, S.; Alajaji, F.; Linder, T. On maximal correlation, mutual information and data privacy. In Proceedings of the 2015 IEEE 14th Canadian Workshop on Information Theory (CWIT), St. John’s, NL, Canada, 6–9 July 2015; pp. 27–31. [Google Scholar]
Pardo, L. Statistical Inference Based on Divergence Measures; CRC Press: Boca Raton, FL, USA, 2018. [Google Scholar]
Ben-Tal, A.; Den Hertog, D.; De Waegenaere, A.; Melenberg, B.; Rennen, G. Robust solutions of optimization problems affected by uncertain probabilities. Manag. Sci. 2013, 59, 341–357. [Google Scholar] [CrossRef]
Duchi, J.C.; Glynn, P.W.; Namkoong, H. Statistics of robust optimization: A generalized empirical likelihood approach. Math. Oper. Res. 2021, 46, 946–969. [Google Scholar] [CrossRef]
Wang, Z.; Glynn, P.W.; Ye, Y. Likelihood robust optimization for data-driven problems. Comput. Manag. Sci. 2016, 13, 241–261. [Google Scholar] [CrossRef]
Mohajerin Esfahani, P.; Kuhn, D. Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations. Math. Program. 2018, 171, 115–166. [Google Scholar] [CrossRef]
Selvi, A.; Liu, H.; Wiesemann, W. Differential Privacy via Distributionally Robust Optimization. arXiv 2023, arXiv:2304.12681. [Google Scholar]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
Huang, C.; Kairouz, P.; Chen, X.; Sankar, L.; Rajagopal, R. Context-aware generative adversarial privacy. Entropy 2017, 19, 656. [Google Scholar] [CrossRef]
Tripathy, A.; Wang, Y.; Ishwar, P. Privacy-preserving adversarial networks. In Proceedings of the 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 24–27 September 2019; pp. 495–505. [Google Scholar]
Mirjalili, V.; Raschka, S.; Namboodiri, A.; Ross, A. Semi-adversarial networks: Convolutional autoencoders for imparting privacy to face images. In Proceedings of the 2018 International Conference on Biometrics (ICB), Gold Coast, QLD, Australia, 20–23 February 2018; pp. 82–89. [Google Scholar]
Bortolato, B.; Ivanovska, M.; Rot, P.; Križaj, J.; Terhörst, P.; Damer, N.; Peer, P.; Štruc, V. Learning privacy-enhancing face representations through feature disentanglement. In Proceedings of the 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020) (FG), Buenos Aires, Argentina, 16–20 November 2020; pp. 45–52. [Google Scholar]
Stoker, J.I.; Garretsen, H.; Spreeuwers, L.J. The facial appearance of CEOs: Faces signal selection but not performance. PLoS ONE 2016, 11, e0159950. [Google Scholar] [CrossRef]
Willenborg, L.; De Waal, T. Elements of Statistical Disclosure Control; Springer Science & Business Media: Berlin, Germany, 2012; Volume 155. [Google Scholar]
Hundepool, A.; Domingo-Ferrer, J.; Franconi, L.; Giessing, S.; Nordholt, E.S.; Spicer, K.; De Wolf, P.P. Statistical Disclosure Control; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
Liese, F.; Vajda, I. f-divergences: Sufficiency, deficiency and testing of hypotheses. In Advances in Inequalities from Probability Theory and Statistics; Nova Publishers: New York, NY, USA, 2008; p. 113. [Google Scholar]
van Erven, T.; Harremoës, P. Rényi divergence and majorization. In Proceedings of the 2010 IEEE International Symposium on Information Theory, Austin, TX, USA, 13–18 June 2010; pp. 1335–1339. [Google Scholar]
Csiszár, I. Information-type measures of difference of probability distributions and indirect observation. Stud. Sci. Math. Hung. 1967, 2, 229–318. [Google Scholar]
Kullback, S. A lower bound for discrimination information in terms of variation (corresp.). IEEE Trans. Inf. Theory 1967, 13, 126–127. [Google Scholar] [CrossRef]
Gilardoni, G.L. On Pinsker’s and Vajda’s type inequalities for Csiszár’s f-divergences. IEEE Trans. Inf. Theory 2010, 56, 5377–5386. [Google Scholar] [CrossRef]
Toth, C.D.; O’Rourke, J.; Goodman, J.E. Handbook of Discrete and Computational Geometry; CRC Press: Boca Raton, FL, USA, 2017. [Google Scholar]
Dwork, C.; McSherry, F.; Nissim, K.; Smith, A. Calibrating noise to sensitivity in private data analysis. In Proceedings of the Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, 4–7 March 2006; pp. 265–284. [Google Scholar]
Dua, D.; Graff, C. UCI Machine Learning Repository. 2017. Available online: https://archive.ics.uci.edu/ (accessed on 10 January 2024).

Figure 1. An overview of the setting of this paper when

F

is a confidence set based on a data set

\vec{x}

. Note that it is typically, but not necessarily, true that

P^{*} \in F

.

Figure 1. An overview of the setting of this paper when

F

is a confidence set based on a data set

\vec{x}

. Note that it is typically, but not necessarily, true that

P^{*} \in F

.

Figure 2. Experiments on the categories sex, race, education, occupation, relationship and native-country of the adult data set. Numbers between brackets indicate

a_{1}

and

a_{2}

(

SRR,

PolyOpt,

IR,

GRR,

NR). (a)

S =

sex (2),

U =

race (5), (b)

S =

race (5),

U =

sex (2), (c)

S =

occ. (15),

U =

edu. (16), (d)

S =

native country (42),

U =

relationship (6).

Figure 2. Experiments on the categories sex, race, education, occupation, relationship and native-country of the adult data set. Numbers between brackets indicate

a_{1}

and

a_{2}

(

SRR,

PolyOpt,

IR,

GRR,

NR). (a)

S =

sex (2),

U =

race (5), (b)

S =

race (5),

U =

sex (2), (c)

S =

occ. (15),

U =

edu. (16), (d)

S =

native country (42),

U =

relationship (6).

Figure 3. Synthetic experiments with

n = 32561

and

β = 0.05

(

SRR,

PolyOpt,

IR,

GRR,

NR). (a)

a_{1} = 2, a_{2} = 5

, (b)

a_{1} = 5, a_{2} = 2

, (c)

a_{1} = 15, a_{2} = 16

, (d)

a_{1} = 42

,

a_{2} = 6

.

Figure 3. Synthetic experiments with

n = 32561

and

β = 0.05

(

SRR,

PolyOpt,

IR,

GRR,

NR). (a)

a_{1} = 2, a_{2} = 5

, (b)

a_{1} = 5, a_{2} = 2

, (c)

a_{1} = 15, a_{2} = 16

, (d)

a_{1} = 42

,

a_{2} = 6

.

Figure 4. Realized privacy parameter

ε^{*}

on synthetic data. Shaded area is bounded by the 25% and 75% quantiles ( Entropy 26 00233 i002

Polyopt,

NR). The green line depicts

ε = ε^{*}

. (a)

a_{1} = 2, a_{2} = 5

, (b)

a_{1} = 5, a_{2} = 2

.

Figure 4. Realized privacy parameter

ε^{*}

on synthetic data. Shaded area is bounded by the 25% and 75% quantiles ( Entropy 26 00233 i002

Polyopt,

NR). The green line depicts

ε = ε^{*}

. (a)

a_{1} = 2, a_{2} = 5

, (b)

a_{1} = 5, a_{2} = 2

.

Figure 5. Normalized difference in NMI for

\hat{P}

and

P^{*}

on synthetic data (

ε = 1.5

), measured over 100 runs. Box denotes 25–75%-quantiles, whiskers denote minima and maxima. S = SRR, P = PolyOpt, I = IR.

Figure 5. Normalized difference in NMI for

\hat{P}

and

P^{*}

on synthetic data (

ε = 1.5

), measured over 100 runs. Box denotes 25–75%-quantiles, whiskers denote minima and maxima. S = SRR, P = PolyOpt, I = IR.

Table 1. Notation used in this paper. ‘Page’ denotes the page the notation is first defined.

Notation	Meaning	Page
$S$	sensitive data space	6
$U$	non-sensitive data space	6
$X$	$S \times U$	6
$a_{1}, a_{2}, a, b$	$\| S \|, \| U \|, \| X \|, \| Y \|$	6
$X = (S, U)$	user data	6
ine $Q$	privacy mechanism	6
Q	matrix of $Q$	6
Y	$Q (X)$	6
$Y$	output space	6
$P_{X}$	space of prob. dist. on $X$	6
$P^{*}$	true distribution	6
$\hat{P}$	estimated distribution	7
I	mutual information	8
$F$	uncertainty set for P	6
$P_{U \| s}$	condition probability vector	9
$F_{U \| s}$	conditional projection of $F$	9
$L_{u \| s} (F), {rad}_{s} (F)$	statistics of $F$	9
$D_{α}$	Rényi divergence	10
$PolyOpt$	PolyOpt	13
${SRR}^{ε}$	Secret Randomized Response	17
${IR}_{R^{1}, R^{2}}$	Independent Reporting	18

Table 2. NMI for synthetic data for various values of

β

(

ε = 1.5

).

Table 2. NMI for synthetic data for various values of

β

(

ε = 1.5

).

	$a_{1} = 2, a_{2} = 5$			$a_{1} = 5, a_{2} = 2$
$β$	0.1	0.01	0.001	0.1	0.01	0.001
SRR	0.231	0.231	0.231	0.126	0.126	0.126
PolyOpt	0.727	0.723	0.719	0.374	0.372	0.370
IR	0.512	0.501	0.492	0.169	0.165	0.162
	$a_{1} = 15, a_{2} = 16$			$a_{1} = 42, a_{2} = 6$
$β$	0.1	0.01	0.001	0.1	0.01	0.001
SRR	0.009	0.009	0.009	0.005	0.005	0.005
IR	0.055	0.053	0.051	0.052	0.052	0.052

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lopuhaä-Zwakenberg, M.; Goseling, J. Mechanisms for Robust Local Differential Privacy. Entropy 2024, 26, 233. https://doi.org/10.3390/e26030233

AMA Style

Lopuhaä-Zwakenberg M, Goseling J. Mechanisms for Robust Local Differential Privacy. Entropy. 2024; 26(3):233. https://doi.org/10.3390/e26030233

Chicago/Turabian Style

Lopuhaä-Zwakenberg, Milan, and Jasper Goseling. 2024. "Mechanisms for Robust Local Differential Privacy" Entropy 26, no. 3: 233. https://doi.org/10.3390/e26030233

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Mechanisms for Robust Local Differential Privacy

Abstract

1. Introduction

2. Related Work

2.1. The Pufferfish Framework

2.2. Other Privacy Frameworks

2.3. Robustness

2.4. Miscellaneous

3. Model and Preliminaries

4. Conditional Projection of F

4.1. Structure of F U | s

4.2. Statistics of F U | s

4.3. Special Case α = 2

5. Polyhedral Approximation: PolyOpt

6. An Optimal Policy for F = P X

7. Independent Reporting

8. Experiments

8.1. Adult Data Set

8.2. Synthetic Data

8.3. Realized Privacy Parameter

8.4. Utility Robustness

8.5. Impact of β

9. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Proofs

Appendix A.1. Proof of Theorem 1

Appendix A.2. Proof of Proposition 1

Appendix A.3. Proof of Lemma 1 and Proposition 2

Appendix A.4. Proof of Theorem 2

Appendix A.5. Proof of Theorem 3

Appendix A.6. Proof of Theorem 4

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4. Conditional Projection of $F$

4.1. Structure of $F_{U | s}$

4.2. Statistics of $F_{U | s}$

4.3. Special Case $α = 2$

6. An Optimal Policy for $F = P_{X}$

8.5. Impact of $β$