Next Article in Journal
The Dynamic Spatial Structure of Flocks
Previous Article in Journal
Convergence of Relative Entropy for Euler–Maruyama Scheme to Stochastic Differential Equations with Additive Noise
Previous Article in Special Issue
On the Optimal Error Exponent of Type-Based Distributed Hypothesis Testing
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Mechanisms for Robust Local Differential Privacy

by
Milan Lopuhaä-Zwakenberg
* and
Jasper Goseling
*
Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, 7522 NB Enschede, The Netherlands
*
Authors to whom correspondence should be addressed.
Entropy 2024, 26(3), 233; https://doi.org/10.3390/e26030233
Submission received: 11 January 2024 / Revised: 29 February 2024 / Accepted: 3 March 2024 / Published: 6 March 2024
(This article belongs to the Special Issue Information Theory for Distributed Systems)

Abstract

:
We consider privacy mechanisms for releasing data X = ( S , U ) , where S is sensitive and U is non-sensitive. We introduce the robust local differential privacy (RLDP) framework, which provides strong privacy guarantees, while preserving utility. This is achieved by providing robust privacy: our mechanisms do not only provide privacy with respect to a publicly available estimate of the unknown true distribution, but also with respect to similar distributions. Such robustness mitigates the potential privacy leaks that might arise from the difference between the true distribution and the estimated one. At the same time, we mitigate the utility penalties that come with ordinary differential privacy, which involves making worst-case assumptions and dealing with extreme cases. We achieve robustness in privacy by constructing an uncertainty set based on a Rényi divergence. By analyzing the structure of this set and approximating it with a polytope, we can use robust optimization to find mechanisms with high utility. However, this relies on vertex enumeration and becomes computationally inaccessible for large input spaces. Therefore, we also introduce two low-complexity algorithms that build on existing LDP mechanisms. We evaluate the utility and robustness of the mechanisms using numerical experiments and demonstrate that our mechanisms provide robust privacy, while achieving a utility that is close to optimal.

1. Introduction

We consider the setting in which an aggregator collects data from many users with the purpose of, for instance, computing statistics or training a machine learning model. In particular, the data contain sensitive information and users do not trust the aggregator. Therefore, they employ a privacy mechanism that transforms the data before sending it to the aggregator. Users have data X = ( S , U ) from a finite alphabet X = S × U , where s S is sensitive information and u U is non-sensitive. Data are distributed i.i.d. across users according to the distribution P * . In order to preserve their privacy, users disclose a sanitized version Y of X by using a privacy mechanism Q : X Y . The aim is that Y contains as much information about X as possible without leaking too much information about S. The challenge that is addressed in this paper is to develop good privacy mechanisms. This scenario and closely related ones were studied in, for instance [1,2,3,4,5,6,7,8,9,10,11]. In this paper, we use the following version of local differential privacy (LDP), as introduced in [3]:
P ( Y = y | S = s ) e ε P ( Y = y | S = s ) ,
for all s , s S and privacy parameter ε > 0 . In addition, we measure the utility of Y through the mutual information I ( X ; Y ) . We discuss differences with related work in Section 2.
Note that if all information is sensitive, i.e., if X = S , (1) reduces to
P ( Y = y | X = x ) e ε P ( Y = y | X = x ) ,
which is the traditional LDP constraint [1,2,5]. An important property of (2) is that it does not depend on P * , but only on Q . The independence of P * is a key factor in the success of differential privacy, since it leverages the need to make assumptions about the distribution of the data or on the background/side-knowledge available to the aggregator. As is clear from (1), however, independence from P * no longer holds if not all data are sensitive.
Assuming that P * is known, one can develop good privacy mechanisms for various settings with partially sensitive information [3,6,12]. In practice, however, P * has to be modeled using domain knowledge or estimated from data, leading to errors. The prevalent approach in the literature has been to develop privacy mechanisms based on a (point) estimate P ^ and analyze sensitivity with respect to. errors in this estimate. In this work, we follow the approach that was proposed in [13,14], which is to construct a set F of probability distributions that we are confident contains P * . Subsequently, we construct privacy mechanisms that aim to maximize utility, while satisfying (1) for all probability distributions in F . We call the resulting privacy framework robust local differential privacy (RLDP).
In a sense, RLDP is a relaxed form of privacy. Indeed, it may seem appealing, but it is—as we illustrate next—often infeasible to enforce (1) for all possible distributions. To this end, we consider two extreme cases. First, consider a joint distribution of S and U under which S = U . Intuitively, we cannot disclose much information about U, since this is directly leaking information about S. As such, the utility of Y is low. Next, consider a joint distribution under which S and U are independent. Intuitively, we can disclose U without additional precautions, providing a high utility on Y. The point is that we need to design a single privacy mechanism Q that satisfies (1) for all distributions, including the ‘worst case’ in which S = U , leading to low utility Y. In this work, we take the mid-ground between, on the one hand, only using a point estimate P ^ and, on the other hand, using all possible distributions. We do so by defining a set of ‘reasonable’ distributions F . In particular, we construct F based on public side-information. This public side information consists of n pairs of data ( s 1 , u 1 ) , , ( s n , u n ) , which like the data of users are i.i.d. according to unknown distribution P * . Our set F is constructed as a closed ball under a Rényi divergence around the maximum likelihood point estimate P ^ of P * . By doing so, we are (statistically) confident that F contains P * , with the radius of the ball controlling the confidence level.
The RLDP framework is an instance of the more general Pufferfish framework [15]. In Section 2, we make this connection explicit and use it to describe the semantic privacy guarantees that are offered by RLDP.
The main contributions of this paper are as follows:
  • We use a Rényi divergence to construct F and analyze the resulting structure and statistics of F . In particular, we demonstrate that projections of F are again balls under the same divergence. Moreover, we bound the projected sets in terms of an l 1 norm.
  • Using these results we approximate F by an enveloping polytope. We then use techniques from robust optimization [16,17,18] to characterize PolyOpt, the mechanism that is optimal over this polytope.
  • A drawback of this method is that it relies on vertex enumeration and is, therefore, computationally unfeasible for large alphabets. Therefore, we introduce two low-complexity privacy mechanisms. The first is independent reporting (IR), in which S and U are reported through separate LDP mechanisms.
  • We characterize the conditions that underlying LDP mechanisms have to satisfy in order for IR to ensure RLDP. Furthermore, while IR can incorporate any LDP mechanism, we show that it is optimal to use randomized response [19]. This drastically reduces the search space and allows us to find the optimal IR mechanism using low-dimensional optimization.
  • The second low-complexity mechanism that we develop is called secret-randomized response (SRR) and is based on randomized response.
  • We show that SRR maximizes mutual information in the low-privacy regime for the case that F is the entire probability simplex.
  • We demonstrate the improved utility of RLDP over LDP with numerical experiments. In particular, we compare the performance of our mechanisms with generalized random response [5]. We provide results for both synthetic data sets and real-world census data.
The structure of this paper is as follows: After discussing related work in Section 2, we describe the model in detail in Section 3. In Section 4, we present results on the structure and statistics of projections of F . These results are used in Section 5 to develop the PolyOpt privacy mechanism. Low-complexity privacy mechanisms are presented in Section 6 and Section 7. In Section 8, we evaluate the discussed methods experimentally. Finally, in Section 9, we provide a discussion of our results and provide an outlook on future work. Most proofs are deferred to Appendix A.
Part of this paper was presented at the IEEE International Symposium on Information Theory 2021 [14]. In this paper, we generalize from a χ 2 -divergence to an arbitrary Rényi divergence. Moreover, Section 4 and Section 6, most of Section 8, and all proofs are new in the current paper.

2. Related Work

2.1. The Pufferfish Framework

Our RLDP framework is an instance of the more general Pufferfish framework [15]. In this subsection, we make this connection explicit and elaborate on the semantic guarantees offered by RLDP.
A privacy definition following the Pufferfish framework specifies (i) a set of potential secrets, (ii) a set of discriminative pairs of secrets, and (iii) a set of assumptions about how data are generated. In RLDP the potential secrets are the possible values of S, i.e., S . We want to prevent the aggregator from learning anything about S. This means that it should not be able to distinguish the case S = s from S = s for all s s , so all non-identical pairs are discriminative. Note that this relies on S being finite, with extensions to continuous S discussed in detail in [15].
The set of assumptions on how data are generated consist, in our setting, of probability distributions over X . A key idea in Pufferfish is that this set explicitly models the information that is available to an attacker, i.e., an entity that is trying to infer information about S by observing Y. In our setting, the aggregator is the only attacker and a probability distribution P over X captures the beliefs that the attacker has about S prior to seeing Y. We can rewrite (1) as
P X P ( S = s | Y = y ) P X P ( S = s | Y = y ) e ε P X P ( S = s ) P X P ( S = s )
and see that our local differential privacy constraint (1) can be interpreted as the condition that the posterior distribution of S after seeing Y must be very close to the prior distribution. The relevance of P is that it captures a specific set of beliefs of the attacker. As such, we want (3) to hold for various values of P, where each P captures specific background/side-knowledge available to the attacker/aggregator. Note that by doing so we are not making any claims about the actual knowledge available to the aggregator, but instead describing the possible scenarios for which we want to protect the privacy of users. In Pufferfish, these possible scenarios are called the set of assumptions on how data are generated, and in RLDP this is F .
Often, side-information in the form of domain knowledge or existing data is publicly available; i.e., to both the users and the aggregator. This public side-information may suggest, for instance, that there is, at most, limited dependence between S and U. In that case, protecting against attackers who have the belief that S = U incurs an enormous penalty in achieved utility. It is true that those attackers gain a lot of information on S by observing Y. However, they could have also obtained this information from the public side-information directly. Therefore, the approach taken in the Pufferfish framework and in this paper is that we only protect against attackers that have beliefs, i.e., distributions P, that are in line with publicly available side information.
A challenge in working with the Pufferfish framework is that it is often challenging to find good mechanisms. A general mechanism is proposed in [20], but it relies on enumerating over all distributions in F , which is an uncountable set in our setting and cannot be used here. A constrained version of Pufferfish that facilitates analysis and a methodology for finding good mechanisms is proposed in [21]. Another interesting line of work is to model correlations between users in the non-local differential privacy setting [22]. Finally, ref. [23] proposed a modeling framework for capturing domain knowledge about the data. In contrast, in the current work, we impose constraints that are learned from data. Our setting does not fit any of the frameworks for which good mechanisms are known in the literature. One of the main contributions of this paper is to develop such mechanisms.

2.2. Other Privacy Frameworks

Disclosing X through a privacy mechanism that protects sensitive information S has been studied extensively. One line of work starts from differential privacy [24] and imposes the additional challenge that the aggregator cannot be trusted, leading to the concept of local differential privacy [1,2,5]. For this setting, several privacy mechanisms exist, including randomized response [19] and unary encoding [25]. Optimal LDP mechanisms under a variety of utility metrics, including mutual information, are found in [5]. In [1,2,5], all data are sensitive, i.e., X = S . The variation of LDP for the case of disclosing X = ( S , U ) , where only S is sensitive, was proposed in [3] and is the setting that we study in this paper. Another line of work connects this setting to the information bottleneck [26], leading to a privacy constraint in terms of mutual information [6,8,9,10]. In these works, it is shown that approaches to optimizing the information bottleneck also work for finding good privacy mechanisms.
Next to differential privacy and mutual information as privacy measures, a multitude of other privacy frameworks and leakage measures exist [27]. Some of these have been studied in the context of privacy mechanisms. In [7,11], privacy leakage is measured through the improved potential of statistical inference by an attacker after seeing the disclosed information. This measure is formulated through a general cost function, with mutual information resulting as a special case. Perfect privacy, which demands the output to be independent of the sensitive data, was studied in [28], and methods were given to find optimal mechanisms in this setting. An estimation-theoretic framework was studied in [29,30]. Our use of a Rényi divergence in the construction of F may suggest considering a generalization of our privacy definition. This could be achieved by considering, for instance, a Rényi divergence in the privacy constraint, as done in [31]. Along a different line, in [32], the maximal leakage measure with a clear operational interpretation is defined. In [33], this measure is generalized to a parametrized measure, enabling interpolating between maximal leakage and mutual information. A stronger, pointwise, version of the maximal leakage measure is proposed in [34]. These are interesting research directions but not pursued in this paper.
Our setting X = ( S , U ) is a special case of a Markov chain S X Y , where only X is observed. This Markov chain is typically studied in the information bottleneck and privacy funnel settings [6,26]. We do not generalize to this setting, because we need observations of S for the estimate of P U | S . Without direct observations of s, we can only make worst-case assumptions on P U | S , leading to very poor utility. A different type of model, in which only part of the information in X is sensitive, is proposed in [12]. This is a block-structured model in which X is partitioned and information about the partition of an element is sensitive but its index in the partition is not. Our setting of X = S × U does not fit this model. One can partition X according to U , but our privacy constraints are different from [12]. We will elaborate on this in Section 6.

2.3. Robustness

The distribution P S , U * is not available in practice. The approach taken in most works is to estimate P S , U * from data and analyze sensitivity with respect to this estimate P ^ S , U . One of the contributions in [7] is to quantify the impact of mismatched priors, i.e., the impact of not knowing P S , U * exactly. A bound on the resulting level of privacy is derived in terms of the total variational distance between the actual and the estimated P ^ S , U . The setting in [35] is similar to ours: A ball of probability distributions, centered around a point estimate, was defined that contains P S , U * with high probability. It was then shown that a privacy mechanism that was designed based on the empirical distribution was valid for the entire set for a looser privacy constraint. The privacy slack was quantified and shown to approach zero as the size of the data set increased. An important difference with the current work was that we explicitly optimize the privacy mechanism over the uncertainty set. Another difference is that we base our ball on a Rényi divergence, whereas [35] used an l 1 norm. The main technical tool used in [35] was large deviations theory, whereas we rely on convex analysis and robust optimization. We also mention [36,37]. In [36] it is assumed that nothing is known about P S * and P U | S * . It is shown that good privacy mechanisms can be found through a connection to maximal correlation, see also [38]. In [37], sets of probability distributions are not derived from data but carefully modeled such that optimal mechanisms can be derived analytically.
Using robust optimization [16] to find a good mechanism that satisfies privacy constraints for all P S , U in uncertainty set F was proposed in [13,14]. In this work, we generalize and extend results from [14]. The idea of robust optimization is that constraints in an optimization problem contain uncertain parameters that are known to come from a (a priori defined) uncertainty set. The constraints must hold for possible values of the uncertain parameters. A key result is that, using Fenchel duality, the problem can be expressed in terms of the support function of the uncertainty set and the convex conjugate of the constraint [16,17]. The case where the uncertain parameters are probabilities is known as distributionally robust optimization. Using results from [39], it was shown in [40] how an uncertainty set can be constructed from data using an f-divergence, providing an approximate confidence set. Confidence sets for parameters that are not necessarily probabilities were constructed in [18] under a χ 2 -divergence. Convergence of robust optimization based on f-divergences was studied in [41] and for the case of a KL-divergence in [42]. In [43], it is shown how distributionally robust optimization problems over Wasserstein balls can be reformulated as convex problems. For the regular differential privacy setting, distributionally robust optimization was used in [44] to find optimal additive privacy mechanisms for a general perturbation cost function. In this paper, we show how robust optimization can be applied to the setting of partially sensitive information with local differential privacy.

2.4. Miscellaneous

Another line of work on privacy mechanisms builds on recent advances in generative adversarial networks [45]. In [46,47], a generative adversarial framework is used to provide privacy mechanisms that do not use explicit expressions for P X . Even though this is not explicitly addressed in [46,47], it is expected that the generalization properties of networks will provide a form of robustness. Closely related approaches are used in the field of face recognition [48,49], with the aim of preventing biometric profiling [50]. The leakage measures that are used in [48,49], however, do not seem to have an operational interpretation.
Disclosing information in a privacy-preserving way is one of the main challenges in official statistics [51,52]. The setting considered in the current paper is closely connected to disclosing a table with microdata, where each record in the table is released independently of the other records. This approach to disclosing microdata was studied in [4] by considering expected error as the utility measure and mutual information as the privacy measure. The resulting optimization problem corresponds to the traditional rate-distortion problem.

3. Model and Preliminaries

In this section, we give an overview of the setting and objectives of this paper. The notation used in this section, as well as the rest of the paper, is summarized in Table 1.
The data space is X = S × U , where S and U are finite sets. We write | S | = : a 1 , | U | = : a 2 , and | X | = a 1 a 2 = : a . Data items X = ( S , U ) are drawn from a probability distribution P * in P X , the space of probability distributions on X ; here, S represents sensitive data, while U represents non-sensitive data. The aggregator’s aim is to create a privacy mechanism Q : X Y such that Y = Q ( X ) contains as much information about X as possible, while not leaking too much information about S.
The mechanism Q is a probabilistic map, which we represent by a left stochastic matrix ( Q y | x ) y Y , x X , and we write | Y | = b . Often, we identify Y = { 1 , , b } , and likewise for other sets.
The distribution P * is not known exactly. Instead, there is a set of possible distributions F P X , where P X denotes the probability simplex over X . We choose F in such a way that it is likely that P * F . The uncertainty set F captures our uncertainty about P * , we guarantee privacy for all P F . We denote this as robust local differential privacy (RLDP).
Definition 1 
(Robust Local Differential Privacy). Let ε 0 and F P X . We say that Q satisfies ( ε , F ) -RLDP if for all s , s S , all y Y , and all P F we have
P X P ( Y = y | S = s ) e ε P X P ( Y = y | S = s ) .
Note that we use the notation P X P ( ) to emphasize that X is distributed according to P. If no confusion can arise, we often leave out the subscript X P , to improve readability. Note that we can also write
P X P ( Y = y | S = s ) = u U Q y | s , u P X P ( U = u | S = s ) ,
so Definition 1 depends on the conditional probabilities of U given S = s and S = s . It does not, however, depend on the realization of U.
For clarity and use in future sections, we give the definition of regular LDP [1], which is used when the goal is to obfuscate all of X, rather than just S.
Definition 2 
(Local Differential Privacy). Let ε 0 . We say that Q : X Y satisfies ε-LDP if for all x , x X and all y Y we have
P ( Y = y | X = x ) e ε P ( Y = y | X = x ) .
Now, for aggregator uncertainty about P * , as captured by F , we suppose there is a data base x = ( x 1 , , x n ) accessible to the user, where each x I = ( s I , u I ) is drawn independently from P * . Based on this, the user produces an estimate P ^ of P * . In the experiments, we consider a maximum likelihood estimator, i.e., P ^ x = | { i n : x I = x } | . We construct the uncertainty set F as a closed ball around P ^ . In particular, let D α be the Rényi divergence of order α on P X , i.e., for α ( 0 , )
D α ( P ^ | | P ) = 1 α 1 log x X P ^ x α P x α 1 , if α 1 , x X P ^ x log P ^ x P x , if α = 1 .
The case α = 1 follows, in fact, as a limit from the α 1 case. Similarly, the definition can be extended to α { 0 , } by taking the corresponding limits, but in this paper we restrict our attention to α ( 0 , ) to keep the presentation clear. Note that D 1 = D KL , the Kullback–Leibler divergence, and D 2 = log χ 2 , where the χ 2 -divergence is χ 2 ( P 1 | | P 2 ) = x ( P 1 , x P 2 , x ) 2 P 2 , x 1 . In general, a Rényi divergence is a continuous increasing function of a power divergence (a.k.a. Hellinger divergence) [39,53,54], an example of an f-divergence. We omit α from the notation when it is clear from the context.
We define F by fixing a bound B [ 0 , ] and letting
F = P P X : D α ( P ^ | | P ) B .
Since a Rényi divergence is a continuous increasing function of an f-divergence, it follows from [39,40] that F is a confidence set for P * . In particular, for the case of α = 2 , which will be used in our numerical experiments in Section 8, for suitable B, we have
F = P P X : x ( P ^ x P x ) 2 P x F χ 2 , a 1 1 ( 1 β ) n ,
with β ( 0 , 1 ) , where F χ 2 , a 1 is the cumulative density function of the χ 2 -distribution with a 1 degrees of freedom, resulting in a set F with significance level β . This means that the probability of P * F is at least 1 β .
Hence, by designing Q based on F , we are confident in satisfying (1) for all attackers that have beliefs that are based on the public side-information, as well as for attackers that have beliefs that are closer to P * .
As a special case of the above, we will study the case that nothing is known about P * . In this case, B and F = P X . Regarding privacy, this is the ‘safest’ choice, as we do not make assumptions about P * . Another special case is where F is a singleton, which reflects a situation where B = 0 and P * is assumed to be known. This setting was studied in [3].
Given F and ε , the goal is now to create a Q : X Y to be used on new/future data; our setting is depicted in Figure 1. The aim of this paper is to find a satisfactory answer to the following problem:
Problem 1. 
Given F and ε, find a Q satisfying ( ε , F ) -RLDP, while maximizing a given utility function.
Throughout this paper, we follow the original privacy funnel [6] and its LDP counterpart [3] in taking mutual information I ( X ; Y ) as a utility measure. As is argued in [6], mutual information arises naturally when minimizing log loss distortion in the privacy funnel scenario. As a utility measure of Q , we take I X P ( X ; Y ) (abbreviated to I P ( X ; Y ) ), since the aim is to create Y that reflects X as faithfully as possible. This utility measure depends on the distribution P of X that we choose to evaluate. Ideally, one would like to use P = P * , but in practice this is not possible, as P * is unknown. In the theoretical part of this paper, we circumvent this issue by proving our results for general P. In the experiments of Section 8, we take P = P ^ as the best available alternative to P = P * . We investigate the effect of this choice by comparing I P * ( X ; Y ) to I P ^ ( X ; Y ) .
Another option is to use the robust utility measure min P F I P ( X ; Y ) to ensure good utility for every ‘reasonable’ P, see [13]. We do not explicitly study this measure in this paper, but since our results hold for general P, they can also be applied to robust utility.
Example 1. 
We set up an example to illustrate the concepts of this paper. Take S = { s 1 , s 2 } and U = { u 1 , u 2 } , and suppose
P * = P s 1 , u 1 * P s 1 , u 2 * P s 2 , u 1 * P s 2 , u 2 * = 0.1 0.1 0.2 0.6 .
Moreover, suppose we have a publicly known database of n = 100 entries, from which we estimate
P ^ = P ^ s 1 , u 1 P ^ s 1 , u 2 P ^ s 2 , u 1 P ^ s 2 , u 2 = 0.07 0.10 0.26 0.57 .
To obtain a 95 % -confidence set for F according to a χ 2 -distribution, we take α = 2 and B = log 1 + F χ 2 , 3 1 ( 0.05 ) 100 = 0.0752 . In this way, we obtain
(12) F = P P X : D α ( P ^ | | P ) B (13) = P P X : log x P ^ x 2 P x log 1 + F χ 2 , 3 1 ( 0.05 ) 100 (14) = P P X : x ( P ^ x P x ) 2 P x F χ 2 , 3 1 ( 0.95 ) 100 ,
which is the desired confidence set (note that the χ 2 -distribution has | X | 1 = 3 degrees of freedom). In this case, we have D 2 ( P ^ | | P * ) = 0.0281 < B , so P * F .

4. Conditional Projection of F

In Section 5 and Section 7 below, we will introduce privacy mechanisms that provide ( ε , F ) -RLDP. These mechanisms depend on the conditional projections of F on P U given S = s , denoted as F U | s . In this section, we analyze the structure and statistics of these sets. To do so, we introduce, for s S , u U and P P X .
(15) P s = u U P u , s , (16) P u | s = P u , s P s , (17) P U | s = ( P u | s ) u U P U , (18) F U | s = { P U | s : P F } P U ,
We are interested in the following statistics:
(19) L u | s ( F ) = min R F U | s R u for a given u U , (20) rad s ( F ) = max R F U | s | | R P ^ U | s | | 1 .
In (19), R u is the u-coefficient of R P U . It turns out that these statistics give us the information required to construct ( ε , F ) -protocols efficiently: In Section 5, we use L u | s ( F ) to approximate F U | s by a polytope, to make computation easier, while in Section 7, we use rad s ( F ) as a measure for the size of F U | s . While these statistics (or bounds for them) are relatively easy to find for F itself, the hard part lies in the fact that we have to give bounds for the projection F U | s . The extent to which these bounds can be found explicitly heavily depends on the divergence measure that is used to construct F . In this section, we show how these bounds can be obtained for our case where we construct F using a Rényi divergence. The reason for this, as we will see below, is that we can give an explicit description of F U | s .

4.1. Structure of F U | s

Recall that, for a given α ( 0 , ) , the Rényi divergence D α : P X [ 0 , ) is defined by
D α ( P ^ | | P ) = 1 α 1 log x X P ^ x α P x α 1 , if α 1 , x X P ^ x log P ^ x P x , if α = 1 .
The following theorem states that the conditional projections of balls defined by Rényi divergence are themselves Rényi divergence balls:
Theorem 1. 
Let s S be such that P ^ s > 0 . Let F be defined by Rényi divergence, i.e.,
F = P P X : D α ( P ^ | | P ) B
for a given α ( 0 , ) and B R 0 . Define the constant B s by
B s = α α 1 log e ( α 1 ) B / α ( 1 P ^ s ) P ^ s , if   α 1 , B P ^ s , if   α = 1 .
Then,
F U | s = R P U : D α ( P ^ U | s | | R ) B s .
This theorem gives us a direct description of the F U | s , which is useful because the L u | s ( F ) of (19) and rad s ( F ) of (20) are defined in terms of these projection sets. A similar bound could also be found for the limit cases α = 0 , , but this is not pursued in this paper, because it does not provide additional insights.
A key property of the Rényi divergence that allows us to prove Theorem 1 is that we can write
P ^ x α P x α 1 = P ^ u | s α P u | s α 1 · P ^ s α P s α 1 .
This allows us to express the divergence D α ( P ^ U | s | | P U | s ) in terms of D α ( P ^ | | P ) . For other divergences, which may depend on P ^ and P in a more complicated way, this is typically not possible. Therefore, we cannot generalize our results to uncertainty sets constructed from, for instance, arbitrary f-divergences.
In light of this theorem and the fact that in the following sections we care more about the statistics of F U | s than about those of F itself, one might be inclined to think that it is more straightforward to estimate the P ^ U | s from the data and defining uncertainty sets F U | s around them directly, without going through the intermediate stage F . However, projecting these sets back to P X results in a larger set. In other words, there are distributions P such that each P U | s is an element of F U | s , while P F . That is, we have F F : = { P P X : s P U | s F U | s } . The reason for this is that, in the proof of Theorem 1, it becomes clear that the P F that project to the boundary points of F U | s satisfy P U | s = P ^ U | s for s s . In other words, elements of F can be extremal in, at most, one F U | s . By contrast, F also includes P that are extremal in multiple F U | s . We conclude that constructing the F U | s directly results in a larger F , which results in a lower utility. We will give an example of this phenomenon in Example 2.

4.2. Statistics of F U | s

In this section, we analyze statistics of F U | s . More concretely, to find L u | s ( F ) and rad s ( F ) , fix s, α and B and define for ρ [ 0 , 1 ] and ξ R 0 such that ξ ( 1 ρ ) 1 ,
(26) φ B s ( ρ , ξ ) = 1 α 1 log ρ ξ 1 α + ( 1 ρ ) 1 ρ ξ 1 ρ 1 α B s , if   α 1 and ρ 1 , ρ log 1 ξ + ( 1 ρ ) log 1 ρ 1 ρ ξ B s , if   α = 1 and ρ 1 , log 1 ξ B s , if   ρ = 1 , (27) ξ ( ρ ) = inf ξ ( 0 , 1 ] : φ B s ( ρ , ξ ) 0 , (28) ξ + ( ρ ) = sup ξ [ 1 , ( 1 ρ ) 1 ) : φ B s ( ρ , ξ ) 0 .
Note that the case ρ = 1 can be obtained via taking the limit. The expressions for ξ and ξ + are a bit complicated, but note that, given ρ < 1 , the function φ B s ( ρ , ξ ) is convex in ξ . Thus, φ B s ( ρ , ξ ) = 0 has at most two solutions. Furthermore, φ B s ( ρ , 1 ) = B s and φ B s ( ρ , ξ ) as ξ approaches 0 or 1 1 ρ , so for ρ < 1 the values ξ ( ρ ) and ξ + ( ρ ) are the two solutions to φ B s ( ρ , ξ ) = 0 .
The following proposition expresses our desired statistics in terms of ξ and ξ + .
Proposition 1. 
Let u U . Then,
(29) L u | s ( F ) = P ^ u | s ξ ( P ^ u | s ) , (30) rad s ( F ) = 2 max U 1 U : U 1 P ^ U 1 | s ( ξ + ( P ^ U 1 | s ) 1 ) .
As discussed above, ξ ± ( ρ ) can be found quickly numerically; however, the calculation of rad s ( F ) still involves taking the maximum over an exponentially large set.

4.3. Special Case α = 2

In this section, we show that when α = 2 , we can find explicit expressions for ξ ± and consequently L u | s and rad s . As discussed in (9), for this α , the set F is a confidence set for a χ 2 -test. To find ξ ( ϱ ) , ξ + ( ϱ ) , we need to solve φ B s ( ρ , ξ ) = 0 . For α = 2 , we can write this as a quadratic equation in ξ , and solving it leads to the following expression:
Lemma 1. 
Suppose α = 2 . Then,
ξ ( ρ ) = e B s + 2 ρ 1 ( e B s 1 ) ( e B s ( 2 ρ 1 ) 2 ) 2 e B s ρ ,
ξ + ( ρ ) = e B s + 2 ρ 1 + ( e B s 1 ) ( e B s ( 2 ρ 1 ) 2 ) 2 e B s ρ .
Now, we can determine L u | s ( F ) and rad s ( F ) using Lemma 1 and Proposition 1. For L u | s ( F ) , we immediately obtain an expression; for rad s ( F ) , a careful analysis of ξ + shows that the optimal U 1 of (30) can be found. For large enough B s , the optimum is at U 1 = { u min } , where u min is the u that minimizes P ^ u | s . Thus, we obtain a concrete expression for rad s ( F ) without the need for optimization. For smaller B s , we do not find an exact expression, but we can still derive a lower bound. The results are summarized in the following proposition.
Proposition 2. 
Let α = 2 . Then, the following hold:
1. 
One has
L u | s ( F ) = e B s + 2 P ^ u | s 1 ( e B s 1 ) ( e B s ( 2 P ^ u | s 1 ) 2 ) 2 e B s .
2. 
Let u min = arg min u U P ^ u | s . If B s log ( 1 + ( 1 P ^ u min | s ) 2 ) , then
rad s ( F ) = e B s + 2 P ^ u min | s 1 + ( e B s 1 ) ( e B s ( 2 P ^ u min | s 1 ) 2 ) e B s .
3. 
If B s < log ( 1 + ( 1 P ^ u min | s ) 2 ) , one has rad s ( F ) e B s 1 .
We note that α = 2 is not the only value of α for which one can bound L u | s and rad s . For instance, for α 1 , one can use Pinsker’s inequality [55,56] and its generalizations [57] to bound rad s ( F ) in terms of | | P ^ U | s P U | s | | 1 , which in turn can be used to bound L u | s ( F ) . However, unlike α = 2 , these do not result in exact bounds.
Example 2. 
We continue Example 1. We have
P ^ s 1 = 0.17 , P ^ u 1 | s 1 = 0.4118 , P ^ u 2 | s 1 = 0.5882 , P ^ s 2 = 0.83 , P ^ u 1 | s 2 = 0.3133 , P ^ u 2 | s 2 = 0.6867 .
Inserting our values of B and P ^ s into Theorem 2, we find B s 1 = 0.3782 , B s 2 = 0.0900 . In other words,
P U | s 1 = R = R u 1 R u 2 P U : D 2 0.4118 0.5882 | | R u 1 R u 2 0.3782 ,
P U | s 2 = R = R u 1 R u 2 P U : D 2 0.3133 0.6867 | | R u 1 R u 2 0.0900 .
To determine the lower bounds on each R u i , we use Proposition 2 to obtain
L u 1 | s 1 ( F ) = 0.1620 , L u 2 | s 1 ( F ) = 0.2829 , L u 1 | s 2 ( F ) = 0.1923 , L u 2 | s 2 ( F ) = 0.5337 .
In principle, we can also use Proposition 2 to determine the rad s ( F ) . However, in this case, there is a more straightforward approach. Since | U | = 2 , every element of F U | s is a vector of length two whose coefficients sum to 1; thus P U | s is determined by P u 1 | s . Since L u 1 | s ( F ) P u 1 | s 1 L u 2 | s ( F ) , it follows that
F U | s 1 [ L u 1 | s 1 ( F ) , 1 L u 2 | s 1 ( F ) ] = [ 0.1620 , 0.7171 ] , F U | s 2 [ L u 1 | s 2 ( F ) , 1 L u 2 | s 2 ( F ) ] = [ 0.1923 , 0.4663 ] .
Under this identification, rad s ( F ) is only twice the maximal distance from P ^ u 1 | s to the endpoint of this interval (the factor two comes from the fact that | | P U | s P ^ U | s | | 1 = | P u 1 | s P ^ u 1 | s | + | P u 2 | s P ^ u 2 | s | = 2 | P u 1 | s P ^ u 1 | s | ). Hence,
rad s 1 ( F ) = 2 max { 0.4118 0.1620 , 0.7171 0.4118 } = 0.6107 , rad s 1 ( F ) = 2 max { 0.3133 0.1923 , 0.4663 0.3133 } = 0.3061 .
We can also construct the set F = { P P X : s P U | s F U | s } of Section 4.1. We can write this as
F = P s 1 , u 1 P s 1 , u 2 P s 2 , u 1 P s 2 , u 2 P X : 0.1620 P u 1 | s 1 0.7171 , 0.1923 P u 1 | s 2 0.4663 .
The inequality 0.1620 P u 1 | s 1 can be written as 0.1620 P s 1 , u 1 P s 1 , u 1 + P s 1 , u 2 , or 0.1620 P s 1 , u 2 0.83830 P s 1 , u 1 ; in other words, this becomes a linear constraint. We can do the same for the other constraints and these, together with inequality constraints of the form P s , u 0 and the equality constraint s , u P s , u = 1 , define the polytope F R 4 . One can calculate that this polytope is a simplex, spanned by the vertices
0.7171 0.2829 0 0 , 0.1620 0.8380 0 0 , 0 0 0.4663 0.5337 , 0 0 0.1923 0.8077 .
The resulting F is considerably larger than F : one way to see this is that, for any of these vertices P, one has D 2 ( P ^ | | P ) = . This example shows the importance of working with the set F , rather than with just its projections F U | s .

5. Polyhedral Approximation: PolyOpt

In this section, we introduce PolyOpt, a family of mechanisms Q with good utility obtained by enclosing F by a polyhedron, and then using robust optimization for polyhedra [16] to describe the space of possible Q as a polyhedron; we then maximize the mutual information over this polyhedron. This approach is related to the polyhedral approach of [3], which finds the optimum for this problem in a non-robust setting.
For a mechanism Q and y Y , we define Q y = ( Q y | x ) x X R X to be the y-th row of the stochastic matrix Q corresponding to Q , but transposed (i.e., viewed as a column vector). Likewise, we define the column vector Q y | s = ( Q y | s , u ) u U R U . In this notation, the condition for ( ε , F ) -RLDP can be formulated as
y Y s 1 , s 2 S : max P F P U | s 1 T Q y | s 1 e ε P U | s 2 T Q y | s 2 0 .
Equation (39) boils down to a set of linear constraints in Q y . What makes these difficult to satisfy is that every value P F provides a linear constraint, and each Q y has to satisfy all infinitely many of these. In this section, we address this difficulty by making the set F slightly larger, so that robust optimization [16] becomes a convenient tool for optimizing over the allowed Q . More precisely, for every s S , let D s P U be such that F U | s D s . Then, certainly
max P F P U | s 1 T Q y | s 1 e ε P U | s 2 T Q y | s 2 max R 1 D s 1 , R 2 D s 2 R 1 T Q y | s 1 e ε R 2 T Q y | s 2 .
Thus, we can conclude that Q is ( ε , F ) -RLDP whenever
y Y s 1 , s 2 S : max R 1 D s 1 , R 2 D s 2 R 1 T Q y | s 1 e ε R 2 T Q y | s 2 0 .
The trick is now to choose the D s in such a way that the set of Q satisfying (41) has a closed-form description. To this end, we let each D s be a polyhedron; that way, we can use robust optimization for polyhedra [16] to give such a description.
There are multiple ways to create a polyhedron D s that envelops F U | s . Writing L u | s = L u | s ( F ) for convenience, we take
D s = { R P U : u R u L u | s } .
Since D s is described by linear equations, it is a polyhedron, and certainly F U | s D U | s for all s. Robust optimization for polytopes [16] then allows us to describe the set of mechanisms satisfying (41). To formulate this, we first need the following definition:
Definition 3. 
Let ε > 0 . Then, define Γ ε to be the convex cone consisting of all v R 0 X that satisfy, for all s 1 , s 2 S and all u 1 , u 2 U :
v s 1 , u 1 e ε v s 2 , u 2 + u L u | s 1 v s 1 , u v s 1 , u 1 e ε u L u | s 2 v s 2 , u v s 2 , u 2 0 .
Note that, for every choice of s 1 , s 2 , u 1 , u 2 , (3) is a linear inequality in T and thus defines a half-space in R X . The intersection of these half-spaces, intersected with R 0 X , defines the convex cone Γ ε . This definition allows us to formulate the following result:
Theorem 2. 
Let Q be a privacy mechanism, and for y Y , let Q y be the y-th row of the associated matrix Q = ( Q y | x ) y Y , x X . Suppose that for all y we have Q y Γ L . Then, Q satisfies ( ε , F ) -RLDP.
The upshot of this theorem is that we have translated the infinitely many constraints of (39) and (41) into the finitely many linear constraints of (3). This makes optimizing utility considerably easier. We perform this optimization by translating it into a linear programming problem. The key inspiration for this optimization is Theorem 4 of [5], where optimal LDP mechanisms are found by translating the problem of optimizing mutual information into linear programming; we use an analogous approach adapted to RLDP. This approach can be sketched as follows: Let Γ ^ = { v Γ ε : x v x = 1 } , i.e., the intersection of Γ ε with the hyperplane corresponding to x v x = 1 . This is a polyhedron, and every Q satisfying the conditions of Theorem 2 has Q y = θ y v y , for some θ y R 0 and v y Γ ^ . The authors of [5] made a number of key observations that also apply to our situation. The first is that, in this case, we can write
I P ^ ( X ; Y ) = y θ y μ ( v y ) ,
where
μ ( v ) = x X v x P ^ x log v x x v x P ^ x .
The second observation is that, in order to maximize (44), one can prove from the convexity of μ that it is optimal to have each v y be a vertex of Γ ^ . Thus, once we know the set of vertices V of Γ ^ , we find the optimal Q by assigning a weight θ v to each v V , in such a way that the resulting Q y form a probabilistic matrix and such that (44) is maximized. Since (44) is linear in θ , this is a linear programming problem. This discussion is summarized in the following theorem:
Theorem 3. 
Let Γ ^ be a polyhedron given by { v Γ L , ε : x v x = 1 } . Let V be the set of vertices of Γ ^ . Define μ as in (45). Let 1 X R X be the constant vector of ones. Let θ ^ R 0 V be the solution to the optimization problem
maximise θ v V θ v μ ( v ) satisfying θ R 0 V , v V θ v v = 1 X .
Let the privacy mechanism Q be given by Y = { v V : θ ^ v > 0 } and Q v | x = θ ^ v v x . Then, the mechanism Q maximizes I P ^ ( X ; Y ) among all mechanisms satisfying the condition of Theorem 2. One has Y a .
Together, Theorems 2 and 3 show that if we can solve a vertex enumeration problem, we can find a mechanism Q that maximizes I P ^ ( X ; Y ) among a subset of all ( ε , F ) -RLDP mechanisms; furthermore, we ensure that the output space Y is, at most, the size of the input space X . The proof of Theorem 3 is analogous to the proof of Theorem 4 of [5] and is given in Appendix A.5. Note that the results of [5] do not run into the vertex enumeration problem, because the relevant polyhedron there is [ 1 , e ε ] X , for which the vertices are known.
We remark that a simplex is not the only possible choice for D s . In general, we can make D s closer to F U | s by adding more defining hyperplanes. Doing this allows more Q to satisfy Theorem 2 and in turn increases the utility of the Q we find via Theorem 3. However, since Γ is related to the D s via duality, adding extra constraints to the D s will increase the dimension of Γ through the addition of auxiliary variables. This makes the vertex enumeration problem of Theorem 3 more computationally involved. Thus, we have a trade-off between utility and computational complexity. Even with the given, ‘simple’ choice of D s , the computational complexity is quite high: recall that we defined a = | X | . The polytope Γ ^ is ( a 1 ) -dimensional and is defined by a 2 + a inequalities, thus it has O ( ( a 2 + a ) a 1 2 ) = O ( a a ) vertices [58]. Since this is the dimension of the linear programming problem, we find that the total complexity of finding Q is O ( a ω a log ( a a + 1 / δ ) ) , where ω 2.38 is the exponent of matrix multiplication and δ the relative accuracy [58]. Clearly, this becomes infeasible rather quickly for large a.
It should be noted that, in general, the increasing utility obtained by decreasing D s in size does not approach the optimal utility over all ( ε , F ) -RLDP mechanisms. This is because, as we take increasingly finer D U | s , we approach the set of Q that satisfy (4) for all P in F : = { P : s P U | s F U | s } . As discussed in Section 4.1, one has F F . As a result, the set of ( ε , F ) -RLDP mechanisms is strictly smaller than the set of ( ε , F ) -RLDP mechanisms.
Example 3. 
We continue Example 2 by taking ε = log 2 . To obtain Γ ^ in Theorem 3, we need to combine the defining inequalities of Γ ε in Definition 3, along with the defining equality x P x = 1 . Regarding the inequalities, we have 2 4 = 16 inequalities of the form (3), as well as 4 inequalities of the form v x 0 . Together with the equality constraint, we obtain a 3-dimensional polytope in R X = R 4 . Using a vertex enumeration algorithm, one finds that V consists of the rows of the matrix V below, where the order of the columns is the order of the rows of Example 1. For each row v, we can calculate μ ( v ) , resulting in the vector μ below. Solving (46), we obtain the vector θ ^ below:
V = 0.0744 0.3227 0.5603 0.0426 0.2426 0.2426 0.4783 0.0364 0.3333 0.3333 0.1667 0.1667 0.1091 0.4737 0.2086 0.2086 0.0993 0.4310 0 0.4697 0.1121 0.4864 0 0.4015 0.3404 0.3404 0 0.3191 0.0770 0.3343 0.2944 0.2944 0.2234 0.2234 0 0.5531 0.4875 0.1434 0 0.3690 0.4360 0.1283 0 0.4358 0.4758 0.1400 0.1921 0.1921 0.3437 0.1011 0.2776 0.2776 0.1602 0.1602 0.6316 0.0481 0.1667 0.1667 0.3333 0.3333 0.3325 0.0978 0.5294 0.0403 , μ = 0.1152 0.0942 0.0087 0.0135 0.1097 0.0968 0.0723 0.0080 0.1240 0.0878 0.1014 0.0106 0.0076 0.1240 0.0075 0.1083 , θ ^ = 1.1899 0 0 0 0 0.7670 0 0 0 0 1.4134 0 0 0 0 0.6297 .
We now obtain the privacy mechanism Q PolyOpt as follows: each row of Q PolyOpt corresponds to a non-zero coefficient of θ ^ , multiplied by its corresponding row of V. Thus, we obtain
Q PolyOpt = Q y 1 | s 1 , u 1 Q y 1 | s 1 , u 2 Q y 1 | s 2 , u 1 Q y 1 | s 2 , u 2 Q y 2 | s 1 , u 1 Q y 2 | s 1 , u 2 Q y 2 | s 2 , u 1 Q y 2 | s 2 , u 2 Q y 3 | s 1 , u 1 Q y 3 | s 1 , u 2 Q y 3 | s 2 , u 1 Q y 3 | s 2 , u 2 Q y 4 | s 1 , u 1 Q y 4 | s 1 , u 2 Q y 4 | s 2 , u 1 Q y 4 | s 2 , u 2
= 0.0885 0.3840 0.6667 0.0507 0.0860 0.3731 0 0.3080 0.6162 0.1813 0 0.6159 0.2094 0.0616 0.3333 0.0254 .
Note that indeed we have 4 = b a = 4 . As for the utility, we have I P ^ ( X ; Y ) = μ · θ ^ = 0.4228 . However, the true utility is significantly lower, namely I P * ( X ; Y ) = 0.2804 .

6. An Optimal Policy for F = P X

As PolyOpt mechanisms are obtained via vertex enumeration in a-dimensional space, this can be computationally infeasible for larger a. Thus, there is a need for methods that, given P ^ and F , can find ( ε , F ) -RLDP mechanisms with reasonable computational complexity.
In this section, we consider the case where F is maximal, i.e., F = P X . By itself, this represents a situation where we want privacy for every possible probability distribution on X . This scenario may not be very relevant in practice, but any protocol that we in find this way is also ( ε , F ) -RLDP for any  F . As we will see below, this allows us to find ( ε , F ) -RLDP protocols in a computationally efficient manner.
We show that ( ε , P X ) -RLDP is almost equivalent to LDP. We exploit this to create SRR, the RLDP analogue to GRR [5], the LDP mechanism that is optimal for ε 0 . SRR only depends on ε and X and not on P ^ , and as such does not require an optimization procedure to be found; this makes it a good choice when vertex enumeration is computationally infeasible. The downside is that SRR has a stricter privacy requirement than PolyOpt, as it takes F to be maximal; in Section 8, we investigate numerically to what extent this results in a lower utility.
We start by giving a characterization of ( ε , P X ) -RLDP. Like LDP, this can be defined by an inequality constraint on the matrix Q.
Proposition 3. 
Q satisfies ( ε , P X ) -RLDP if and only if for all y Y and ( s , u ) , ( s , u ) X with s s one has
Q y | s , u Q y | s , u e ε .
Proof. 
Suppose that Q satisfies ( ε , F ) -RLDP with respect to P X . Let ( s , u ) , ( s , u ) X with s s . Let P be given by
P x = 1 2 , if   x { ( s , u ) , ( s , u ) } , 0 , otherwise .
Then, P u | s = 1 and P u | s = 0 for all u u ; an analogous statements holds for P u | s . It follows that
(52) Q y | s , u Q y | s , u = Q y | s , u P u | s Q y | s , u P u | s (53) = u Q y | s , u P u | s u Q y | s , u P u | s (54) = P X P ( Q ( X ) = y | S = s ) P X P ( Q ( X ) = y | S = s ) e ε .
This proves “⇒”. On the other hand, suppose that Q y | s , u Q y | s , u e ε for all s s and u , u . Then, for all s s and P, we have
P X P ( Q ( X ) = y | S = s ) P X P ( Q ( X ) = y | S = s ) = u Q y | s , u P u | s u Q y | s , u P u | s e ε .
Hence, Q satisfies ( ε , P X ) -RLDP with respect to. F . □
The proposition demonstrates that RLDP is very similar to LDP. The difference is that the condition “for all x , x X ” from Definition 2 is relaxed to only those x and x for which s s .
Before moving on and introducing a new mechanism, note that Proposition 3 clearly illustrates the reason that the setting in this paper cannot be modeled using the block-structured approach from [12]. We see that if u u , we still have a privacy constraint, whereas in [12] this is not the case.
Next, we will introduce a mechanism that exploits the difference between LDP and RLDP. Recall that a = | X | ; then generalized randomized response [19] is the privacy mechanism GRR ε : X X given by
GRR y | x ε = e ε e ε + a 1 if   x = y , 1 e ε + a 1 otherwise .
This mechanism has been designed such that GRR y | x ε GRR y | x ε = e ± ε for x x , the maximal fractional difference that ε -LDP allows. We will see that for RLDP we can go up to a difference of e ± 2 ε if x = ( s , u ) and x = ( s , u ) , as we typically only need to satisfy
Q y | s , u e ε Q y | s , u e 2 ε Q y | s , u .
We capture the intuition from the necessary condition (57) in a new mechanism called secret randomized response (SRR). Recall that a 1 = | S | , a 2 = | U | .
Definition 4. 
(Secret randomized response (SRR)). Let ε > 0 . Then, the privacy mechanism SRR ε : X X is given by
SRR s , u | s , u ε = e ε e ε + e ε ( a 2 1 ) + a a 2 , i f   ( s , u ) = ( s , u ) , e ε e ε + e ε ( a 2 1 ) + a a 2 , i f   s = s   a n d   u u , 1 e ε + e ε ( a 2 1 ) + a a 2 , i f   s s ,
It is clear that SRR y | s , u ε SRR y | s , u ε { e 2 ε , e ε , 1 , e ε , e 2 ε } , and the two extreme cases are only possible when s = s . Thus, we can conclude
Lemma 2. 
SRR satisfies ( ε , P X ) -RLDP.
Example 4. 
We continue Example 3. Although SRR is closely related to GRR, adopting it can still have a significant impact on utility. For instance, in the setting of Example 3, we obtain
GRR ε = 0.4 0.2 0.2 0.2 0.2 0.4 0.2 0.2 0.2 0.2 0.4 0.2 0.2 0.2 0.2 0.4 , SRR ε = 0.444 0.111 0.222 0.222 0.111 0.444 0.222 0.222 0.222 0.222 0.444 0.111 0.222 0.222 0.111 0.444 .
Then,
I P ^ ( X ; GRR ε ( X ) ) = 0.0419 ,                 I P ^ ( X ; SRR ε ( X ) ) = 0.1005 ,
I P * ( X ; GRR ε ( X ) ) = 0.0412 ,                 I P * ( X ; SRR ε ( X ) ) = 0.0942 .
We see that adopting SRR more than doubles the utility. Compared to Example 3, we see that the utility is still significantly lower than that of PolyOpt, but the advantage is that we obtain SRR directly from ε, without having to take P ^ or F into account; this ensures a significantly faster computation.
The power of SRR, beyond slightly improving on GRR, is that we can prove it maximizes I P ( X ; Y ) for sufficiently large ε ; the cutoff point depends on P. This is proven analogously to the result of [5], where GRR is the optimal LDP mechanism for sufficiently large ε .
Theorem 4. 
For every P, there is an ε 0 0 such that for all ε ε 0 , SRR is the ( ε , P X ) -RLDP mechanism maximizing I P ( X ; Y ) .
The proof of this theorem follows the same lines as the proof of Theorem 14 of [5], in which it is proven that GRR is the optimal LDP mechanism for sufficiently large ε . The proof is presented in Appendix A.6. This solves the problem of finding the optimal ( ε , P X ) -mechanism, for sufficiently large ε . This strategy is similar to the proof of Theorem 3: one can show that the rows Q y of the optimal ( ε , F ) -RLDP mechanism Q correspond to vertices of a polyhedron, and the optimal weights assigned to these vertices are found using a linear programming problem. Unlike in the case of Theorem 3, however, we can give an explicit description of the set of vertices, and we can solve the linear programming problem analytically.
Our result shows that if one wishes to satisfy ( ε , P X ) -RLDP, then SRR is a solid choice, especially for larger ε , since it maximizes I P * ( X ; Y ) for sufficiently ε . Thus, we can optimize I P * ( X ; Y ) without having to know P * , with the caveat that the cutoff point for ‘large enough’ depends on P * .
In [5], the optimal LDP mechanism in the high-privacy regime (i.e., ε 1 ) was also found. In principle, we could also do this for ( ε , P X ) -RLDP, but this would not be of much use, as the optimal mechanism would depend on P * , which we assume to be unknown.

7. Independent Reporting

Section 5 demonstrated the need to find efficiently computable ( ε , F ) -RLDP mechanisms with decent utility. In Section 6, we approach this problem by considering ( ε , P X ) -RLDP instead, allowing us to analytically obtain the optimal mechanism. However, when F is small, this overapproximation might result in a large loss of utility. In this section, we describe independent reporting (IR), a different heuristic that takes the size of F into account, while still being significantly less computationally complex than PolyOpt.
The basis of IR is to apply two separate LDP mechanisms R 1 and R 2 to S and U, respectively, reporting both outputs.
Definition 5. 
Let Y 1 , Y 2 be sets, and let Y = Y 1 × Y 2 . Let R 1 : S Y 1 and R 2 : U Y 2 be probabilistic maps. Then, theindependent reporting of R 1 and R 2 is the probabilistic map I R R 1 , R 2 : X Y given by I R R 1 , R 2 ( s , u ) = ( R 1 ( s ) , R 2 ( u ) ) .
Suppose that R i satisfies ε i -LDP. The composition theorem for differential privacy [59] tells us that I R R 1 , R 2 satisfies ( ε 1 + ε 2 ) -LDP. However, in the RLDP setting, U only indirectly leaks information about S; therefore, we can get away with a higher ε 2 compared to the LDP setting. How much higher depends on the degree of relatedness of S and U, which is captured by the possible values of P in F . The precise statement is given in the following result:
Theorem 5. 
Let ε 1 , ε 2 R 0 . For each s, let d s [ 0 , ) be such that d s rad s ( F ) . Furthermore, define
d = min 2 , max s ( 2 d s ) + max s , s | | P ^ U | s P ^ U | s | | 1 .
Let δ 2 = log 1 + 2 ( e ε 2 1 ) d . Suppose that R 1 is ε 1 -LDP and that R 2 is δ 2 -LDP. Then, IR is ( ε 1 + ε 2 , F ) -RLDP.
If S = U , then | | P ^ U | s P ^ U | s | | 1 = 2 for s s , so d = 2 and δ 2 = ε 2 . In this case, Theorem 5 is the RLDP analogue to the well-known composition theorem for local differential privacy [59]. In general, δ 2 ε 2 ; this represents the fact the privacy requirement on R 2 is less strict when S and U are only partially related. At the other extreme, if S and U are independent in our observation, we have | | P ^ U | s P ^ U | s | | 1 = 0 for all s , s . Still, we cannot fully disclose U, since S and U might be non-independent under P * . The term d s is present in the definition of d to account for this possibility.
In order to prove Theorem 5, we need the following lemma:
Lemma 3. 
Let Q : X Y be an ε-LDP mechanism. Then, for all y Y and all P , P P X we have
P X P ( Q ( X ) = y ) P X P ( Q ( X ) = y ) 1 + e ε 1 2 | | P P | | 1 .
Proof. 
Fix y, and let Q y max = max x Q y | x and Q y min = min x Q y | x . By the ε -LDP property, it holds that Q y max e ε Q y min . We hence find
(64) P X P ( Q ( X ) = y ) P X P ( Q ( X ) = y ) = x X Q y | x ( P x P x ) (65) = x : P x P x Q y | x ( P x P x ) x : P x > P x Q y | x ( P x P x ) (66) Q y max 2 | | P P | | 1 Q y min 2 | | P P | | 1 (67) ( e ε 1 ) Q y min 2 | | P P | | 1 (68) ( e ε 1 ) P X P ( Q ( X ) = y ) 2 | | P P | | 1 ,
from which the lemma directly follows. □
Proof 
(Proof of Theorem 5). We start by showing that d is an upper bound for | | P U | s P U | s | | 1 . If d = 2 , this is certainly the case. Suppose d = max s ( 2 d s ) + max s , s | | P ^ U | s P ^ U | s | | 1 . Then, for all s , s S and P F we have
(69) | | P U | s P U | s | | 1 | | P U | s P ^ U | s | | 1 + | | P ^ U | s P ^ U | s | | 1 + | | P ^ U | s P U | s | | 1 (70) d s + d s + | | P ^ U | s P ^ U | s | | 1 (71) d .
Combining Lemma 3 with the fact that ε 2 = log 1 + d ( e δ 2 1 ) 2 , it follows that for every y 2 Y 2 , we have
(72) P X P ( R 2 ( U ) = y 2 | S = s ) P X P ( R 2 ( U ) = y 2 | S = s ) 1 + e δ 2 1 2 | | P U | s P U | s | | 1 (73) 1 + d ( e δ 2 1 ) 2 (74) = e ε 2 .
Given S, the random variables R 1 ( S ) and R 2 ( U ) are independent. It follows that for every y 1 Y 1 and every y 2 Y 2 , we have
(75) P ( R 1 ( S ) = y 1 , R 2 ( U ) = y 2 | S = s ) P ( R 1 ( S ) = y 1 , R 2 ( U ) = y 2 | S = s ) = P ( R 1 ( S ) = y 1 , | S = s ) P ( R 1 ( S ) = y 1 | S = s ) · P ( R 2 ( U ) = y 2 | S = s ) P ( R 2 ( U ) = y 2 | S = s ) (76) e ε 1 + ε 2 ,
where the last equality holds because of (74) and because R 1 is ε 1 -LDP. This shows that I R R 1 , R 2 is ( ε 1 + ε 2 , F ) -RLDP. □
Theorem 5 establishes the privacy of independent reporting. To maximize the utility, we need to determine how to divide the privacy budget ε between ε 1 and ε 2 , and which LDP mechanisms to use for R 1 and R 2 . To answer both these questions, we first need an expression for the utility of IR, which is given by the following theorem:
Theorem 6. 
For any P P X , one has
I P ( I R R 1 , R 2 ( X ) ; X ) = I P ( R 1 ( S ) ; S ) + I P ( R 2 ( U ) ; U | R 1 ( S ) ) .
Proof. 
Since R 1 ( S ) and U are independent given S, and R 2 ( U ) and S are independent given U and R 1 ( S ) , we have
(78) I P ( I R R 1 , R 2 ( X ) ; X ) = I P ( R 1 ( S ) , R 2 ( U ) ; U , S ) (79) = I P ( R 1 ( S ) ; U , S ) + I P ( R 2 ( U ) ; U , S | R 1 ( S ) ) (80) = I P ( R 1 ( S ) ; S ) + I P ( R 2 ( U ) ; U | R 1 ( S ) ) .
We use Theorems 5 and 6 to find high-utility IR protocols that satisfy ( ε , F ) -RLDP, given ε and F . To do so, we need to choose R 1 and R 2 , and split the privacy budget between them. Since the expression for the utility of IR in Theorem 6 contains a term I P ( R 1 ( S ) ; S ) , the R 1 that maximizes this is GRR when ε is large enough; thus, we choose R 1 = GRR . The second term in the utility expression is
I P ( R 2 ( U ) ; U | R 1 ( S ) ) = E r I U P U | R 1 ( S ) = r ( R 2 ( U ) ; U ) .
This is the expected value of an expression that is maximized for R 2 = GRR , with the caveat that the maximization only holds when ε is large enough, and what ‘large enough’ is depends on the distribution of U. Since this gives us a choice of R 2 independent of the distribution, we ignore this caveat and take R 2 = GRR as well.
Having chosen R 1 and R 2 , we are only left with the division of the privacy budget. If we choose ε 2 , then by Theorem 5 the privacy parameters of R 1 and R 2 are ε 1 = ε ε 2 and δ 2 = log 1 + 2 ( e ε 2 1 ) d , respectively. It follows that to find a high-utility IR protocol, we have to solve the following optimization problem:
maximize ε 2 I P GRR ε ε 2 ( S ) , GRR log 1 + 2 ( e ε 2 1 ) d ( U ) ; S , U subject to ε 2 [ 0 , ε ] .
This optimization problem is only 1-dimensional. While it is not straightforward to express the complexity of solving this in O -notation, our experiments in Section 8 show this can be quickly performed numerically, and significantly faster than PolyOpt.
Example 5. 
We continue Example 4. Having found rad s ( F ) and P ^ U | s 1 , P ^ U | s 2 in Example 2, we conclude that, in Theorem 5, we have
d = min 2 , 2 · max { 0.6107 , 0.3061 } + 0.4118 0.5882 0.3133 0.6867 1 = 1.4591 .
It follows that δ 2 = log 1 + 2 1.4591 ( e ε 2 1 ) = log ( 1.3707 e ε 2 0.3707 ) . For a given value of ε 2 , the matrix corresponding to I R ( GRR log ( 2 ) ε 2 , GRR δ 2 ) is the Kronecker product
2 e ε 2 2 e ε 2 + 1 1 2 e ε 2 + 1 1 2 e ε 2 + 1 2 e ε 2 2 e ε 2 + 1 1.3707 e ε 2 0.3707 1.3707 e ε 2 + 0.6293 1 1.3707 e ε 2 + 0.6293 1 1.3707 e ε 2 + 0.6293 1.3707 e ε 2 0.3707 1.3707 e ε 2 + 0.6293 = 1 C 2.7414 0.7414 e ε 2 2 e ε 2 1.3707 e ε 2 0.3707 1 2 e ε 2 2.7414 0.7414 e ε 2 1 1.3707 e ε 2 0.3707 1.3707 e ε 2 0.3707 1 2.7414 0.7414 e ε 2 2 e ε 2 1 1.3707 e ε 2 0.3707 2 e ε 2 2.7414 0.7414 e ε 2 ,
where C = ( 2 e ε 2 + 1 ) ( 1.3707 e ε 2 + 0.6293 ) . We now wish to optimize its utility, i.e., find the ε 2 [ 0 , log 2 ] that maximizes I P ^ ( X ; Y ) . The optimum occurs at the boundary ε 2 = log ( 2 ) , for which I P ^ ( X ; Y ) = 0.0755 . Notice that now ε 1 = 0 , so R 1 = GRR 0 is completely random: its output does not depend on the input. In other words, the optimal IR protocol in this case does not transmit any direct information about S at all, only indirectly through GRR δ 2 ( U ) . In this case, we have
Q I R = 0.3517 0.1483 0.3517 0.1483 0.1483 0.3517 0.1483 0.3517 0.3517 0.1483 0.3517 0.1483 0.1483 0.3517 0.1483 0.3517 .
Regarding the ‘true’ utility, we have I P * ( X ; Y ) = 0.0718 . Interestingly, Q I R yields less utility than SRR. As we will see in Section 8, this is typical for small S and U .

8. Experiments

In order to gain insight into the behavior of the different mechanisms, we performed several experiments, both on synthetic and real data. We compared the three mechanisms introduced in this paper (PolyOpt, SRR, and IR). Throughout, we let F be a confidence set for a χ 2 -test, i.e., for a Rényi divergence with α = 2 . We used the results of Section 4 to find explicit expressions for L u | s ( F ) and (an upper bound for) rad s ( F ) . Recall from Section 3 that
F = P P X : D 2 ( P ^ | | P ) log 1 + F χ 2 , a 1 1 ( 1 β ) n ,
where F χ 2 , a 1 is the cumulative density function of the χ 2 -distribution with a 1 degrees of freedom, and β ( 0 , 1 ) is a chosen significance level. Throughout the experiments, we took β = 0.05 , unless otherwise specified.
We used I P ^ ( X ; Y ) as a utility metric, divided by H ( X ) to obtain the normalized mutual information (NMI). We used this rather than I P * ( X ; Y ) , as the aggregator only has access to the former. In fact, while P * is known for the synthetic data, this is not the case for real data, so we cannot even use I P * ( X ; Y ) as a utility metric.
We compared our methods to two existing approaches, each with a slightly different privacy model. First, we compared to an LDP mechanism, to see to what extent the RLDP framework offered a utility improvement over regular LDP. As the LDP mechanism, we chose GRR, because it optimizes I P ( X ; Y ) , our privacy metric, in the low-privacy regime [5]. Second, we compared to the non-robust optimal mechanism of [3]. This mechanism is obtained in a manner similar to PolyOpt, and is the optimal mechanism that satisfies (in our notation) ( ε , { P ^ } ) -RLDP. In other words, it is optimal in the scenario where one knows P * precisely. We shall refer to this mechanism as NR (non-robust). Typically, we would expect NR to have a higher utility than our RLDP mechanisms, (because it only needs to satisfy privacy with respect to. one distribution) and GRR to have worse a utility (because LDP is stricter than RLDP).

8.1. Adult Data Set

We performed numerical experiments on the adult data set (n = 32,561) [60], which contains demographic data from the 1994 US census. Some examples, where we used different categorical attributes from the data set as S and U, are depicted in Figure 2. We omitted PolyOpt from the larger two experiments, as the space complexity became unfeasible: for occupation vs. education, the polyhedron Γ ^ was 240-dimensional and was defined by 57,840 inequality constraints; to find its set of vertices Matlab needed to operate on a 57,840 × 57,840 matrix, whose size (24.7 GB) exceeded Matlab’s maximum array size.
We can see that PolyOpt clearly outperformed IR and SRR in the first two experiments, especially in the high-privacy regime (low ε ). Similarly, IR outperformed SRR in the high-privacy regime, but was slightly overtaken for high ε . This is interesting, since SRR satisfies a stronger privacy guarantee, as it provides privacy for all adversary assumptions, so we expected it to offer less utility than IR. An explanation for this is that IR is forced to transmit S and U separately, and so it can be less efficient than SRR, which does not have this restriction. At any rate, the difference between IR and SRR in the low-privacy regime was only marginal compared to the advantage of PolyOpt over both. In the second two experiments, where PolyOpt was infeasible, we can see that IR clearly outperformed SRR. Overall, we see that, especially in the low-privacy regime, PolyOpt was the preferable RLDP mechanism, followed by IR and SRR. Furthermore, we can see that, in all experiments, GRR performed the worst, and the best RLDP mechanism significantly outperformed GRR. This shows that adopting RLDP as a privacy metric results in significantly better utility over LDP. Conversely, NR outperfored the RLDP methods, although the difference between NR and PolyOpt was marginal for higher ε . As for PolyOpt, NR was computationally out of reach for larger | X | .

8.2. Synthetic Data

To study the robustness of our method with respect to utility (Section 8.4) and privacy (Section 8.3), we also needed experiments in which P * was known. For this, we considered experiments on synthetic data. For this, we first randomly created a probability distribution P * on X , where X was the same as in the experiments on the adult data set. The distribution P * was drawn from the Jeffreys prior on P X , i.e., the symmetric Dirichlet distribution with parameter 1 2 . From P * , we then drew n = 32 , 561 elements of X , which we used to obtain the estimate P ^ ; this estimate was then used to create the privacy mechanisms. We carried this out 100 times, and we averaged the NMI of these 100 distributions. The results are shown in Figure 3. The results were similar to those of the experiments of the adult data set: PolyOpt outperformed IR, which outperformed SRR, for small | X | SRR could overtake IR in the low-privacy regime. Furthermore, GRR was the worst overall, while NR was the best overall, but only by a small margin.

8.3. Realized Privacy Parameter

In the previous subsections, we saw that NR had a (marginally) better utility than PolyOpt. However, this is not a completely fair comparison, since NR was only designed to give privacy for X P ^ and might result in a larger privacy leakage for X P * . For the synthetic data, P * was known, and we could measure the true privacy leakage. For a protocol Q , we defined the realized privacy parameter  ε * as
ε * = max y Y , s 1 , s 2 S P X P * ( Y = y S = s 1 ) P X P * ( Y = y S = s 2 ) = max y Y , s 1 , s 2 S u Q y | s 1 , u P u | s 1 * u Q y | s 2 , u P u | s 2 * .
Note that this becomes when there exist s , y such that P X P * ( Y = y S = s ) = 0 . We compared ε * for NR and PolyOpt: the results are shown in Figure 4, where we give the 25% and 75% quantiles for both protocols, out of 100 considered distributions. As one can see, NR’s ε * was consistently greater than ε , while PolyOpt’s ε * was consistently lesser. This is what we expected, as NR does not give privacy guarantees for P * , but PolyOpt does when P * F , which happens with 95% probability. Note that the privacy leakage was especially bad for low ε : at ε = 0.075 , the lowest value of ε we tested, the 75%-quantile of ε * of NR was 0.3897 , which is more than 5 times the desired privacy parameter. Overall, we can conclude that NR gave marginally better utility, but this came at quite a privacy cost.

8.4. Utility Robustness

For the synthetic data sets (where we knew P * ), we also investigated the normalized difference in mutual information I P ^ ( X ; Y ) I P * ( X ; Y ) I P ^ ( X ; Y ) , to see to what extent we could use I P ^ ( X ; Y ) as a utility metric in lieu of the true utility I P * ( X ; Y ) . This is shown for the three methods in Figure 5, at ε = 1.5 . Overall, we can see that the difference was quite minor: for all three methods, the difference in NMI, even at its most extreme, was less than 3% of the NMI value. Furthermore, the differences were very symmetric, with the difference being positive and negative approximately equally often. We can conclude that we were justified in using I P ^ ( X ; Y ) as a utility metric in the other experiments.

8.5. Impact of β

We also considered the impact of β on utility for synthetic data (fixing ε = 1.5 ). The results are shown in Table 2, which are averages over 100 runs. Note that SRR does not depend on β , since it assumes F = P X . Interestingly, we can see that the impact of β was quite limited; changing β by a factor 100 had at most about 4 % impact on NMI. This impact was less for PolyOpt than for IR, and less for larger X . Overall, we can conclude that by choosing β closer to 0, we can significantly increase the robustness of privacy without making a considerable impact on utility.

9. Conclusions and Future Work

In this paper, we presented a number of algorithms that, given a desired privacy level ε , an estimated distribution P ^ , and a bound on the Rényi divergence D α ( P ^ | | P ) , return privacy mechanisms that satisfy a differential privacy-like privacy constraint for the part of the data that is considered sensitive, for all distributions P within the divergence bound. The first class of privacy mechanisms, PolyOpt, offers high utility, but is computationally complex, as it relies on vertex enumeration. The second class, SRR, satisfies a stronger privacy requirement and is optimal in the low-privacy regime with reference to this requirement, but as a result has less utility than mechanisms that do not satisfy this stronger privacy requirement. The third class, IR, is a general framework for releasing the sensitive and non-sensitive part of the data independently, and the optimal division of the privacy budget between these can be found via 1-dimensional optimization; thus, the optimal IR mechanism can be found quickly, while still offering decent utility. Furthermore, taking RLDP rather than LDP as a privacy constraint, i.e., protecting only the part of the data that is sensitive, significantly improves utility. In particular, we showed that the utility of PolyOpt is close to the utility of the optimal non-robust privacy mechanism. In other words, asking for robustness in privacy comes at only a small performance penalty in utility. At the same time, we showed that not asking for robustness comes at a substantial privacy cost.
There are various interesting directions for future research to build upon the results in this paper. One direction is to find analytical bounds on the performance gap between PolyOpt and optimal mechanisms, in particular on the gap with reference to either the non-robust optimal mechanism from [3] or with reference to an optimal robust mechanism. Note, however, that for the moment we do not have any results on optimal robust mechanisms. Another direction is to improve the performance of the low-complexity algorithms that have been proposed. For instance, in independent reporting, one could change the underlying LDP mechanism from GRR to an optimal mechanism. Since GRR is only optimal in the high-privacy regime, we expect that there would be room for improvement in the low-privacy regime. A significant challenge is incorporating optimal mechanisms along the lines of [5]; however, these mechanisms depend on P * which is inaccessible in the RLDP framework. Yet another interesting direction would be to incorporate robustness in utility in addition to robustness in privacy. This would require finding a mechanism that maximizes min P F I P ( X ; Y ) . The challenge in this is that I P ( X ; Y ) is concave in P, which makes minimizing it over F difficult. Finally, it would be interesting to apply the RLDP framework to other models. In this work, we studied the model where X splits into a sensitive part S and a non-sensitive part U. It would be interesting to also study the more general case where X is correlated with the sensitive data S, or to apply RLDP to the models that are studied in [12].

Author Contributions

Conceptualization, M.L.-Z. and J.G.; Formal analysis, M.L.-Z. and J.G.; Investigation, M.L.-Z. and J.G.; Methodology, M.L.-Z. and J.G.; Software, M.L.-Z.; Writing, M.L.-Z. and J.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially funded by the Netherlands Organisation for Scientific Research (NWO) grant 628.001.026, ERC Consolidator grant 864075 CAESAR and the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 101008233.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Proofs

Appendix A.1. Proof of Theorem 1

This follows from the following four lemmas, where the RHS of (24) is denoted F ¯ U | s :
Lemma A1. 
If α 1 , then F U | s F ¯ U | s .
Proof. 
Assume α < 1 ; the case α > 1 is handled analogously. Then, we rewrite D α ( P ^ | | P ) B as
x P ^ x α P x α 1 e B ( α 1 ) .
Let C = e B ( α 1 ) . Then,
(A2) P ^ s α P s α 1 u P ^ u | s α P u | s α 1 = u P ^ s , u α P s , u α 1 (A3) C s s u P ^ s , u α P s , u α 1 .
For s S { s } and u U , define P s , u | ¬ s = P u , s 1 P s and P ^ s , u | ¬ s = P ^ u , s 1 P ^ s . Then, (A3) can be written as
P ^ s α P s α 1 u P ^ u | s α P u | s α 1 C ( 1 P ^ s ) α ( 1 P s ) α 1 s s u P ^ s , u | ¬ s α P s , u | ¬ s α 1 .
Furthermore, P | ¬ s = ( P s , u | ¬ s ) s S { s } , u U and P ^ | ¬ s = ( P ^ s , u | ¬ s ) s S { s } , u U form probability distributions on ( S { s } ) × U . As such, we have
u P ^ s , u | ¬ s α P s , u | ¬ s α 1 = e ( α 1 ) D α ( P ^ | ¬ s | | P | ¬ s ) 1 .
Applying this to (A4), we obtain
P ^ s α P s α 1 u P ^ u | s α P u | s α 1 C ( 1 P ^ s ) α ( 1 P s ) α 1
or
u P ^ u | s α P u | s α 1 P s α 1 P ^ s α C ( 1 P ^ s ) α ( 1 P s ) α 1 .
To find the bound on u P ^ u | s α P u | s α 1 , we have to minimize the RHS of this inequality. The only unknown on the right is P s . We find the minimum value of the right-hand side by differentiating with respect to P s , for which we obtain
( α 1 ) P s α 2 ( 1 P ^ s ) α P ^ s α C ( 1 P ^ s ) α 1 ( 1 P s ) α .
Setting this equal to 0, we find P s = 1 C 1 / α ( 1 P ^ s ) . Substituting this into (A7), we obtain
u P ^ u | s α P u | s α 1 ( C 1 / α ( 1 P ^ s ) ) α P ^ s α
which can be written as
D α ( P ^ U | s | | P U | s ) α α 1 log e ( α 1 ) B / α ( 1 P ^ s ) P ^ s ,
showing that P U | s F ¯ U | s . Since P was chosen arbitrarily, we can conclude F U | s F ¯ U | s . □
Lemma A2. 
If α 1 , then F ¯ U | s F U | s .
Proof. 
Again we assume α < 1 . Suppose that R P U satisfies D α ( P ^ U | s | | R ) B s . Let C be as in (A1) and define γ = 1 C 1 / α ( 1 P ^ s ) ; then,
(A11) 1 α 1 log u P ^ u | s α R u α 1 = D α ( P ^ U | s | | R ) (A12) B s (A13) = α α 1 log e ( α 1 ) B / α ( 1 P ^ s ) P ^ s (A14) = α α 1 log C 1 / α γ P ^ s
which we can express as
u P ^ u | s α R u α 1 C γ α P ^ s α .
Define P P X by
P u , s = γ R u , if   s = s , C 1 / α P ^ u , s otherwise .
Then, P U | s = R , and
(A17) u , s P ^ u , s α P u , s α 1 = u P ^ u , s α γ α 1 R u α 1 + u s s C 1 / α P ^ u , s (A18) = P ^ s α γ α 1 u P ^ u | s α R u α 1 + C ( α 1 ) / α ( 1 P ^ s ) (A19) γ C + C ( α 1 ) / α ( 1 P ^ s ) (A20) = C .
As in the proof of Lemma A1, the condition u , s P ^ u , s α P u , s α 1 C is equivalent to D α ( P ^ | | P ) B . Thus, we can conclude that P F and so R = P U | s F U | s . Since R was chosen arbitrary, this shows F ¯ U | s F U | s . □
Lemma A3. 
If α = 1 , then F U | s F ¯ U | s .
Proof. 
Let P F , and define P | ¬ s , P ^ | ¬ s as in the proof of Lemma A1. Then,
(A21) D 1 ( P ^ | | P ) = P ^ s u P ^ u | s log P ^ s P ^ u | s P s P u | s + ( 1 P ^ s ) s , u P ^ u , s | ¬ s ( 1 P ^ s ) P ^ u , s | ¬ s ( 1 P s ) P u , s | ¬ s (A22) = P ^ s D 1 ( P ^ U | s | | P U | s ) + ( 1 P ^ s ) D 1 ( P ^ | ¬ s | | P | ¬ s ) + P ^ s log P ^ s P s + ( 1 P ^ s ) log 1 P ^ s 1 P s (A23) = P ^ s D 1 ( P ^ U | s | | P U | s ) + ( 1 P ^ s ) D 1 ( P ^ | ¬ s | | P | ¬ s ) + D 1 ( V P ^ s | | V P s ) ,
where for p [ 0 , 1 ] , the random variable V p is defined to follow a Bernoulli distribution with P ( V p = 1 ) = p . Since D 1 is non-negative and D 1 ( P ^ | | P ) B , we find
(A24) D 1 ( P ^ U | s | | P U | s ) = 1 P ^ s D 1 ( P ^ | | P ) ( 1 P ^ s ) D 1 ( P ^ | ¬ s | | P | ¬ s ) D 1 ( V P ^ s | | V P s ) (A25) D 1 ( P ^ | | P ) P ^ s (A26) B P ^ s .
Thus, P U | s F ¯ U | s ; since P F was chosen arbitrary, we can conclude F U | s F ¯ U | s . □
Lemma A4. 
If α = 1 , then F ¯ U | s F U | s .
Proof. 
Let R P U be such that D 1 ( P ^ U | s | | R ) B P ^ s . Define P P X by
P u , s = P ^ s R u , if   s = s , P ^ u , s if   s s .
Then P U | s = R . Furthermore, in (A23) one has D 1 ( P ^ | ¬ s | | P | ¬ s ) = D 1 ( V P ^ s | | V P s ) = 0 , and so D 1 ( P ^ | | P ) B . This shows that P F , and so R = P U | s F U | s . Since R was chosen arbitrarily, we can conclude that F ¯ U | s F U | s . □

Appendix A.2. Proof of Proposition 1

We first prove the following two auxiliary lemmas. We only prove these for α > 1 ; the other cases are handled analogously.
Lemma A5. 
Let x X , and define
(A28) ξ ( ρ ) = inf ξ ( 0 , 1 ] : E B ( ρ , ξ ) 0 , (A29) ξ + ( ρ ) = sup ξ [ 1 , ( 1 ρ ) 1 ) : E B ( ρ , ξ ) 0 ,
where E B is as in Proposition 1. Then, min P F P x = P ^ x ξ ( P ^ x ) and max P F = P ^ x ξ + ( P ^ x ) .
Proof. 
As in the proof of Lemma A1, define C = e ( α 1 ) B ; thus
F = P P X : x X P ^ x α P x α 1 C .
Furthermore, define a function F by
F ( ρ , ξ ) = ρ ξ 1 α + ( 1 ρ ) 1 ρ ξ 1 ρ 1 α C ,
with F ( 1 , ξ ) = ξ 1 α C the limit as ρ 1 . Then, E B ( ρ , ξ ) = 1 α 1 log F ( ρ , ξ ) + e ( α 1 ) B B , so F ( ρ , ξ ) 0 E B ( ρ , ξ ) 0 . Thus, ξ ( ρ ) = inf { ξ [ 0 , 1 ] : F ( ρ , ξ ) 0 } and the analogous statement holds for ξ + ( ρ ) . The P that yield the extremal P x lie on the boundary of F ; hence, they either satisfy P x { 0 , 1 } , or the equality
x X P ^ x α P x α 1 = C .
In the latter case, the extremal values of P x have to be stationary points of the Lagrangian expression
P x + λ x P ^ x α P x α 1 C + μ x P x 1 = 0 .
Taking derivatives with respect to all P x , we find
(A34) 1 + ( 1 α ) λ P ^ x α P x α + μ = 0 , (A35) x x : ( 1 α ) λ P ^ x α P x α + μ = 0 .
It follows that P x = ( ( α 1 ) λ μ 1 ) 1 / α P ^ x = : ξ P ^ x and P x = ( ( α 1 ) λ μ ) 1 / α P ^ x = : ψ P ^ x for all x x , where ξ and ψ do not depend on x or x . We can find ξ , ψ R 0 by solving the joint set of equations
(A36) C = x P ^ x α P x α 1 (A37) = P ^ x α P x α 1 + x x P ^ x α P x α 1 (A38) = P ^ x ξ 1 α + ( 1 P ^ x ) ψ 1 α , (A39) 1 = x P x (A40) = P ^ x ξ + ( 1 P ^ x ) ψ .
Define ρ = P ^ x . Then, (A40) implies ψ = 1 ρ ξ 1 ρ , and the condition ψ 0 is equivalent to ξ ρ 1 . Substituting this into (A38) shows that we find ξ by solving F ( ρ , ξ ) = 0 for ξ ( 0 , ( 1 ρ ) 1 ) . Since F ( ρ , 1 ) = 1 C < 0 and F is strictly convex in ξ , there exists, at most, one solution in ( 0 , 1 ] and, at most, one in [ 1 , ( 1 ρ ) 1 ) . It follows that (A33) has, at most, two stationary points, which must correspond to the minimal and maximal value of P x . If the solution in ( 0 , 1 ] exists, it is equal to ξ ( ρ ) , and this stationary point of (A33) corresponds to the minimal value of P x , which is then equal to P ^ x ξ ( P ^ x ) . If the solution in ( 0 , 1 ] does not exist, then the minimal value of P x is not attained on the boundary and is equal to 0, which then is also equal to P ^ x ξ ( P ^ x ) . Either way, we find
min P F P x = P ^ x ξ ( P ^ x ) .
The proof for the maximal value of P x is analogous. □
Lemma A6. 
For X 1 X define P ^ X 1 : = x X 1 P ^ x . Then,
sup P F | | P P ^ | | 1 = 2 max X 1 X : X 1 P ^ X 1 ( ξ + ( P ^ X 1 ) 1 ) .
Proof. 
For a given P, define X 1 = { x X : P x P ^ x } and X 2 = { x X : P x < P ^ x } . To find the maximal value of | | P P ^ | | 1 , we first maximize it for a given partition X 1 , X 2 of X , and then we maximize over all partitions. Note that X 1 = is impossible, and for X 1 = X , we have P = P ^ , which is certainly not optimal. Given X 1 , X 2 , one has
| | P P ^ | | 1 = x X 1 ( P x P ^ x ) + x X 2 ( P ^ x P x ) .
As before, the P maximizing this lies either on the boundary of the probability simplex or it satisfies (A32). For the latter case, we have the Lagrangian expression
x X 1 ( P x P ^ x ) + x X 2 ( P ^ x P x ) + λ x P ^ x α P x α 1 C + μ x P x 1 = 0 .
Taking derivatives, we find, analogously to (A34)–(A35), that there exist ξ , ψ such that P x = ξ P ^ x for all x X 1 and P x = ψ P ^ x for all x X 2 . By definition of X 1 and X 2 , we have ξ 1 and 0 ψ < 1 . Analogously to (A36)–(A40), these have to satisfy
(A45) P ^ X 1 ξ 1 α + ( 1 P ^ X 1 ) ψ 1 α = C , (A46) P ^ X 1 ξ + ( 1 P ^ X 1 ) ψ = 1 .
From this point onward, this proof is analogous to that of Lemma A5. Let ρ = P ^ X 1 . Expressing ψ in terms of ξ and substituting this means that to find ξ we have to solve F ( ρ , ξ ) = 0 for ξ [ 1 , ( 1 ρ ) 1 ) , where F is as in the proof of Lemma A5. As before, at most, one such solution exists, and when it does, it corresponds to the maximal value of | | P P ^ | | 1 (given X 1 ). If it does not exist, then the maximal value of | | P P ^ | | 1 is obtained at the boundary where P ^ X 1 = 1 . Either way the maximum is obtained when ξ = ξ + ( ρ ) , which means that
(A47) | | P P ^ | | 1 = x X 1 ( P x P ^ x ) + x X 2 ( P ^ x P x ) (A48) = x X 1 P ^ x ( ξ + ( ρ ) 1 ) + x X 2 P ^ x 1 1 ρ ξ + ( ρ ) 1 ρ (A49) = ρ ( ξ + ( ρ ) 1 ) + ( 1 ρ ) 1 1 ρ ξ + ( ρ ) 1 ρ (A50) = 2 ρ ( ξ + ( ρ ) 1 ) .
This is the maximal value of | | P P ^ | | 1 given X 1 ; we now find the overall maximum by maximizing over all non-empty X 1 . □
Proof of Proposition 1. 
In Lemmas A5 and A6, take U instead of X , P ^ U | s instead of P ^ , and B s instead of B. Then, by Theorem 1, the role of F is taken by F U | s . Thus, applying Lemmas A5 and A6 gives us Proposition 1 directly. □

Appendix A.3. Proof of Lemma 1 and Proposition 2

As in the proof of Proposition 1, since by Theorem 1 the projected set F U | s is defined by a Rényi divergence as is F , it suffices to prove the analogous statements about F rather than F U | s . Concretely, we prove the following:
Lemma A7. 
Suppose α = 2 and define B ˜ = e B 1 ; let ξ ± be as in Lemma A5. Then,
ξ ± ( ρ ) = B ˜ + 2 ρ ± B ˜ 2 + 4 ρ B ˜ 4 B ˜ ρ 2 2 ρ ( B ˜ + 1 ) .
Furthermore, the following hold:
1. 
Let x min = arg min x X P ^ x . If B ˜ ( 1 P ^ x min ) 2 , then the maximum in (A6) is attained at X 1 = { x min } .
2. 
If B ˜ < ( 1 P ^ x min ) 2 one has sup P F | | P P ^ | | 1 B ˜ .
The formulas here look slightly different from those in Lemma 1 and Proposition 2. We use this form because it makes the proof more convenient: replacing B ˜ with e B 1 throughout yields exactly the results of Lemma 1 and Proposition 2 for F instead of F U | s .
Proof. 
Consider the function F ( ρ , ξ ) from (A31) for α = 2 and C = B ˜ + 1 , i.e.,
F ( ρ , ξ ) = ρ ξ + ( 1 ρ ) 2 1 ρ ξ B ˜ 1 .
Then, F ( ρ , ξ ) = 0 can be rewritten to a quadratic equation in ξ . Its two roots are ξ ± ( ρ ) , and with some rewriting they can be expressed as in (31). For points 1 and 2, we note that
2 ρ ( ξ + ( ρ ) 1 ) = B ˜ 2 B ˜ ρ + B ˜ 2 + 4 B ˜ ρ 4 B ˜ ρ 2 B ˜ + 1 .
We can find its extremal values with respect to ρ by taking the derivative and setting it to 0, i.e., by solving
2 B ˜ B ˜ + 1 + 2 B ˜ 4 B ˜ ρ ( B ˜ + 1 ) B ˜ 2 + 4 B ˜ ρ 4 B ˜ ρ 2 = 0 ,
which has a single solution ρ opt = 1 B ˜ 2 . Since (A53) is concave in ρ , this means that this unique extremal value is a maximum. If B ˜ ( 1 2 P ^ x min ) 2 , then ρ opt P ^ x min , and ρ ( ξ + ( ρ ) 1 ) is decreasing in ρ on [ P ^ x min , 1 ] . Since all possible values of P ^ X 1 lie in this interval, it is optimal to take X 1 such that P ^ X 1 is minimized, i.e., X 1 = { x min } ; this proves point 1. For point 2 we have (and also for general B)
(A55) sup P F | | P P ^ | | 1 = 2 max X 1 X : X 1 P ^ X 1 ( ξ + ( P ^ X 1 ) 1 ) (A56) 2 ρ opt ( ξ + ( ρ opt ) 1 ) (A57) = B ˜ 2 B ˜ 1 B ˜ 2 + B ˜ 2 + 4 B ˜ 1 B ˜ 2 4 B ˜ ( 1 B ˜ ) 2 4 B ˜ + 1 (A58) = B ˜ .

Appendix A.4. Proof of Theorem 2

Let D = s S D s R X . Thus, an element t D is of the form t = ( t s , u ) ( s , u ) X , and for any s, we have ( t s , u ) u U D s . For s 1 , s 2 S , let B s 1 , s 2 R X × X be the matrix given by
B ( s , u ) ; ( s , u ) s 1 , s 2 = 1 , if   u = u and s = s = s 1 , e ε , if   u = u and s = s = s 2 , 0 , otherwise .
Then, we can rewrite (41) as
y , s 1 , s 2 : max t D ( ( B s 1 , s 2 ) T Q y ) T t 0 .
Recall that for each s, we have D s = { R P U : R u L u | s } . Since D = s D s , we can write
(A61) D = t R X : s , u : t s , u L u | s , s : u t s , u = 1 (A62) = { t R X : Φ t + ϕ 0 , Ψ t + ψ = 0 } ,
where Φ R X × X , ϕ R X , Ψ R S × X and ψ R S are given, for s , s S and u U , by
(A63) Φ = id X , (A64) ϕ s , u = L u | s , (A65) Ψ s ; ( s , u ) = 1 , if   s = s , 0 , otherwise , (A66) ψ s = 1 .
Combining this with (A60), we find that Q satisfies ( ε , F ) -RLDP whenever
y , s 1 , s 2 : max t R X : Φ t + ϕ 0 , Ψ t + ψ = 0 ( ( B s 1 , s 2 ) T Q y ) T t 0 .
Now fix y , s 1 , s 2 , and consider the linear programming problem that forms the LHS of (A67). From the duality of linear programming, we know
max t R X : Φ t + ϕ 0 , Ψ t + ψ = 0 ( ( B s 1 , s 2 ) T Q y ) T t = min z R X , w R S : Φ T z + Ψ T w = ( B s 1 , s 2 ) T Q y , z 0 ϕ T z + ψ T w .
We focus on the linear programming problem of the RHS. The terms of this problem are given by
(A69) Φ T z = z , (A70) ( Ψ T w ) s , u = w s , (A71) ( ( B s 1 , s 2 ) T Q y ) s , u = Q y | s 1 , u , if   s = s 1 , e ε Q y | s 2 , u , if   s = s 2 , 0 , otherwise , (A72) ϕ T z = s , u L u | s z s , u , (A73) ψ T w = s w s .
The equation Φ T z + Ψ T w = ( B s 1 , s 2 ) T Q y can now be rewritten as
z s , u = Q y | s 1 , u w s 1 , if   s = s 1 , e ε Q y | s 2 , u w s 2 , if   s = s 2 , w s otherwise .
Thus, the restriction z 0 translates to
w s 1 max u U Q y | s 1 , u , w s 2 e ε min u U Q y | s 2 , u , s s 1 , s 2 : w s 0 .
Furthermore, the objective function ϕ T z + ψ T w becomes
s 1 u L u | s w s + u Q y | s 1 , u L u | s 1 e ε u Q y | s 2 , u L u | s 2 .
Combining this with (A67) and (A68), we see that a sufficient condition for Q to be ( ε , F ) -RLDP is if there exists a w R S such that
(A76) s 1 u L u | s w s + u Q y | s 1 , u L u | s 1 e ε u Q y | s 2 , u L u | s 2 0 , (A77) w s 1 max u U Q y | s 1 , u , (A78) w s 2 e ε min u U Q y | s 2 , u , (A79) s s 1 , s 2 : w s 0 .
Since u L u | s 1 for all s, it follows that the left-hand side of (A76) is minimal if each w s attains its maximal value, subject to the constraints (A77)–(A79). Substituting this, we find that the minimum of the left-hand side is equal to
(A80) 1 u L u | s 1 max u 1 Q y | u 1 , s 1 e ε 1 u L u | s 2 min u 2 Q y | u 2 , s 2 + u Q y | s 1 , u L u | s 1 e ε u Q y | s 2 , u L u | s 2 (A81) = max u 1 , u 2 U 1 u L u | s 1 Q y | u 1 , s 1 e ε 1 u L u | s 2 Q y | u 2 , s 2 + u Q y | s 1 , u L u | s 1 e ε u Q y | s 2 , u L u | s 2 (A82) = max u 1 , u 2 U Q y | u 1 , s 1 e ε Q y | u 2 , s 2 + u L u | s 1 ( Q y | s 1 , u Q y | s 1 , u 1 ) e ε u L u | s 2 ( Q y | s 2 , u Q y | s 2 , u 2 ) .
This has to be nonpositive for all choices of u 1 , u 2 , s 1 , s 2 , y ; but this is true precisely if Q y Γ L , ε for all y.

Appendix A.5. Proof of Theorem 3

This is essentially analogous to the proof of Theorem 4 in [5]; the main difference is that the equivalent of Γ ^ is a hypercube, for which a vertex enumeration step is not needed. Let Q be a mechanism such that Q y Γ for all y; then there exist α y R 0 , γ y Γ ^ such that Q y = α y γ y . One has
I P ^ ( X ; Y ) = y μ ( Q y ) = y α y μ ( γ y ) .
Since Γ ^ is the convex hull of V , we can write γ y = v λ y , v v for suitable constants λ y , v . Define θ R 0 V by θ v = y λ y , v α y . Then,
v θ v v = y Q y = 1 X .
As such, the matrix Q R V × X defined by Q v = θ v v defines a privacy mechanism Q . One has
(A85) I P ^ ( X ; Q ( X ) ) = v μ ( Q v ) (A86) = v θ v μ ( v ) (A87) = y α y v λ y , v μ ( v ) (A88) y α y μ v λ y , v v (A89) = I P ^ ( X ; Q ( X ) ) ,
where we use the fact that μ is convex. This shows that the Q y of the optimal mechanism satisfying Theorem 2 are all of the form θ v · v ; hence, (46) yields the optimal mechanism. To see that | Y | a , observe that the polyhedron described in (46) is defined by a equality constraints, and | V | inequality constraints of the form θ v 0 . Hence, any vertex of this polyhedron has at most a nonzero coefficients. Since the optimal mechanism corresponds to such a vertex, and its output space Y corresponds to its nonzero coefficients, we conclude that | Y | a . □

Appendix A.6. Proof of Theorem 4

We follow the proof of Theorem 14 in [5]; however, we first need the following auxiliary lemma.
Lemma A8. 
Let ε > 0 , and let C R 0 X be the positive cone defined by
C = { C R 0 X : C s , u e ε C s , u for all s s S , u U } .
Define the sets V 1 , V 2 , V R 0 X by
(A91) V 1 = v R 0 X : s   s . t . u : v s , u { e ε , e ε } ; s s , u : v s , u = 1 , (A92) V 2 = v R 0 X : x : v x { 1 , e ε } , | { s : u   s . t . v s , u = e ε } | 2 , (A93) V = V 1 V 2 .
Then V spans C as a positive cone, i.e.,
C = v V θ v v : θ R 0 V .
Proof. 
For every s S and u , u U , we have
C s , u e ε C s , u e 2 ε C s , u ,
where s S { s } is arbitrary. Thus, in every C C two coefficients can differ by at most a factor e ε if they have different s, and at most a factor e 2 ε if they have the same s. On the extremal rays of C , the inequalities become equalities. By rescaling by a positive scalar, if necessary, we see that C is spanned by vectors of which each coefficient is in the set { e ε , 1 , e ε } . In other words, if V = { e ε , 1 , e ε } X C , then
C = Span ( V ) ,
where Span refers to the span as in (A94). To determine V we consider two situations: either v contains both e ε and e ε as coefficients, or not.
Suppose v contains e ε and e ε , say v s , u = e ε and v s , u = e ε . By (A90), we must have s = s , and by (A95), this means that v s , u = 1 for s s and any u . Thus, we define, for any s S , the set
V s = v R X : s s u : v s , u = 1 .
It is straightforward to show that V s C , and by the discussion above any v V containing both e ε and e ε is in s V s .
Suppose v does not contain both e ε and e ε , then v V 2 V 3 where
V 2 = { 1 , e ε } X ,
V 3 = { e ε , 1 } X .
Furthermore, it is easy to see that V 2 V 3 V . Thus, we conclude that
V = s S V s V 2 V 3 .
To obtain from V to V , we throw out some vectors that are not needed to span C . We start with V s . Given s, define the set
V s = v R 0 X : u : v s , u { e ε , e ε } ; s s , u : v s , u = 1 .
It is clear that V s V s ; we claim that
Span ( V s ) = Span ( V s ) .
To see this, let v V s V s , and define v , v + R 0 X by
v s , u + = e ε , if   s = s and v s , u = e ε , 1 , if   s s , e ε , if   s = s and v s , u { 1 , e ε } ,
v s , u = e ε , if   s = s and v s , u { e ε , 1 } , 1 , if   s s , e ε , if   s = s and v s , u = e ε .
In other words, v ± takes all s-coefficients of v that are equal to 1 and changes them to e ± ε . Then, v + , v V s and
v = 1 e ε + 1 v + + e ε e ε + 1 v .
Thus, v Span ( V s ) , proving (A102). We now consider V 2 and V 3 . First note that V 3 = e ε V 2 , so
Span ( V 2 ) = Span ( V 3 ) .
We furthermore claim that
Span V 2 s S V s = Span V 2 s S V s ,
where V 2 is as in (A92). Note that clearly V 2 V 2 . To see (A107), let v V 2 V 2 ; this means that there is at most a single ( s , u ) such that v s , u = e ε . If no such ( s , u ) exists, then v = 1 X , the constant vector with all ones. This implies that e ε v V 2 , showing that v Span ( V 2 ) . Now suppose that there is exactly one ( s , u ) such that v s , u = e ε . Then,
v s , u = e ε , if   s = s and u = u , 1 , otherwise .
But then we can construct v + as in (A103) and v as in (A104), and again we find
v = 1 e ε + 1 v + + e ε e ε + 1 v Span ( V s ) .
This proves (A107). Combining (A102), (A106) and (A107) we obtain
(A110) C = Span V 2 V 3 s S V s (A111) = Span V 2 s S V s (A112) = Span V 2 V 1 .
Proof of Theorem 4. 
We follow the proof of Theorem 14 in [5]. For C R 0 X , define
μ ( C ) = x P x C x log C x x P x C x .
For y Y , let Q y = ( Q y | x ) x R X ; then the utility of a mechanism Q : X Y is given by I P ( X ; Y ) = y μ ( Q y ) . Furthermore, μ is a sublinear function in the sense of Definition 1 of [5].
We fix an ε > 0 . Furthermore, let C R 0 X be as in Lemma A8. Then, a mechanism Q satisfies ( ε , P X ) -RLDP if and only if each Q y is an element of C . Let V be the spanning set of V of Lemma A8, and let D be the polytope spanned by V . If Q satisfies ε -SLDP, then every column Q y is of the form θ y · d y , where d y D and θ y R 0 are such that y θ y d y = 1 X . Analogously to the proof of Theorems 2 and 4 in Section 7 of [5] (or, for that matter, our proof of Theorem 3), one proves that the optimal Q is found by taking b = a , and taking d y V for all d. Since
I ( X ; Y ) = y μ ( Q y ) = y θ y μ ( d y )
we can find the optimal Q by solving the following optimization problem, where m R V is the vector ( μ ( v ) ) v V , and where A R X × V is the matrix whose v-th column is v:
maximize θ R V m · θ such that A · θ = 1 X , θ 0 .
From here, we follow Section 9.5 of [5]. The dual to the above problem is
minimize α R X ( 1 X ) · α such that A T · α m , α 0 .
By duality, we have max θ m · θ = min α ( 1 X ) · α . We describe α * 0 and θ * 0 , depending on ε , such that for sufficiently large ε one has A T · α * m , such that m · θ * = ( 1 X ) · α * and A θ * = 1 X , and such that θ * corresponds to SRR, i.e., for each y Y = X there is a v ^ y V such that SRR y ε = θ v ^ y * v ^ y . Together, this proves that SRR is optimal for ε 0 .
More concretely, for y = ( s , u ) X , define <