On the Lift, Related Privacy Measures, and Applications to Privacy–Utility Trade-Offs

Zarrabian, Mohammad Amin; Ding, Ni; Sadeghi, Parastoo

doi:10.3390/e25040679

Open AccessArticle

On the Lift, Related Privacy Measures, and Applications to Privacy–Utility Trade-Offs^†

by

Mohammad Amin Zarrabian

^1,*

,

Ni Ding

²

and

Parastoo Sadeghi

^3,*

¹

College of Engineering, Computing and Cybernetics, Australian National University, Canberra, ACT 2601, Australia

²

School of Computing and Information Systems, University of Melbourne, Parkville, VIC 3010, Australia

³

School of Engineering and Information Technology, University of New South Wales, Canberra, ACT 2600, Australia

^*

Authors to whom correspondence should be addressed.

^†

Preliminary results of this work have been published in part at the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing, Singapore, 22–27 May 2022, and IEEE Information Theory Workshop, Mumbai, India, 1–2 November 2022.

Entropy 2023, 25(4), 679; https://doi.org/10.3390/e25040679

Submission received: 20 February 2023 / Revised: 6 April 2023 / Accepted: 14 April 2023 / Published: 18 April 2023

(This article belongs to the Special Issue Information-Theoretic Privacy in Retrieval, Computing, and Learning)

Download

Browse Figures

Versions Notes

Abstract

:

This paper investigates lift, the likelihood ratio between the posterior and prior belief about sensitive features in a dataset. Maximum and minimum lifts over sensitive features quantify the adversary’s knowledge gain and should be bounded to protect privacy. We demonstrate that max- and min-lifts have a distinct range of values and probability of appearance in the dataset, referred to as lift asymmetry. We propose asymmetric local information privacy (ALIP) as a compatible privacy notion with lift asymmetry, where different bounds can be applied to min- and max-lifts. We use ALIP in the watchdog and optimal random response (ORR) mechanisms, the main methods to achieve lift-based privacy. It is shown that ALIP enhances utility in these methods compared to existing local information privacy, which ensures the same (symmetric) bounds on both max- and min-lifts. We propose subset merging for the watchdog mechanism to improve data utility and subset random response for the ORR to reduce complexity. We then investigate the related lift-based measures, including

ℓ_{1}

-norm,

χ^{2}

-privacy criterion, and

α

-lift. We reveal that they can only restrict max-lift, resulting in significant min-lift leakage. To overcome this problem, we propose corresponding lift-inverse measures to restrict the min-lift. We apply these lift-based and lift-inverse measures in the watchdog mechanism. We show that they can be considered as relaxations of ALIP, where a higher utility can be achieved by bounding only average max- and min-lifts.

Keywords:

local information privacy; local differential privacy; watchdog privacy mechanism; optimal random response

1. Introduction

With the recent emergence of “Big-Data”, generating, sharing, and analysing data are proliferating via the advancement of communication systems and machine learning methods. While sharing datasets is essential to achieve social and economic benefits, it may lead to the leakage of private information, which has raised great concern about the privacy preservation of individuals. The main approach to protect privacy is perturbing the data via a privacy mechanism. Consider some raw data denoted by random variable X and some sensitive features denoted by S, which are correlated via a joint distribution

P_{S X} \neq P_{S} \times P_{X}

. A privacy mechanism (characterised by the transition probability

P_{Y | X}

) is applied to publish Y as a sanitised version of X to protect S.

The design of a privacy mechanism depends on the privacy measure. Differential privacy (DP) [1,2,3] is a widely used notion of privacy. DP restricts the chance of revealing the individual’s presence in a dataset from the outcome of analysis over that dataset [4]. It ensures that neighboured sensitive features s and

s^{'}

, which differ in only one entry, result in a similar output probability distribution, by restricting the ratio between posterior beliefs

P_{Y | S} (y | s) / P_{Y | S} (y | s^{'})

below a threshold

e^{ε} .

The neighbourhood assumption is relaxed in the local differential privacy (LDP) [5,6,7,8,9], where the ratio between posterior beliefs is restricted below

e^{ε}

for any two sensitive features s and

s^{'}

, denoted by

ε

-LDP. The quantity of

ε

is known as the privacy budget. DP and LDP are considered context-free privacy notions, i.e., they do not take into account the prior distribution

P_{S}

. In contrast, in information-theoretic (IT) privacy, also known as context-aware privacy [10,11], it is assumed that the distribution of data or an estimation of them is available. Some of the dominant IT privacy measures are mutual information (MI) [10,12,13], maximal leakage [14,15,16],

α

-leakage [17], and local information privacy (LIP) [11,18,19,20,21,22,23,24,25,26,27,28]. A challenge is that while data perturbation restricts privacy leakage, it necessarily reduces data resolution and datasets’ usefulness. Therefore, a privacy mechanism is desired to deliver a satisfactory level of data utility. Depending on the application, data utility is quantified either by measures of similarity between X and Y, such as f-divergence [7] and MI [7,12], or measures of dissimilarity and error, such as Hamming distortion [8,9] and mean square error [22], respectively. This tension between privacy and utility is known as the privacy–utility trade-off (PUT).

In this paper, we consider lift, a pivotal element in IT privacy measures, which is the likelihood ratio between the posterior belief

P_{S | Y} (s | y)

and prior belief

P_{S} (s)

about sensitive features in a dataset:

l (s, y) = \frac{P_{S | Y} (s | y)}{P_{S} (s)} = \frac{P_{S Y} (s, y)}{P_{S} (s) P_{Y} (y)} .

(1)

The logarithm of the lift

i (s, y) = log l (s, y)

, which we call log-lift, is the information density [24]. For each y, the more

P_{S | Y} (s | y)

differs from

P_{S} (s)

, the more the adversary gains knowledge about s [29]. Consequently, both min-lift and max-lift, denoted by

{min}_{s} l (s, y)

and

{max}_{s} l (s, y)

, respectively, quantify the highest privacy leakage for each y. In [29], the role of min-lift and max-lift in privacy breach was proposed, based on which information privacy was introduced in [18]. Accordingly, min-lift is associated with revealing what values are less probable for s after observing y, while max-lift is associated with the more probable values. In addition, recently, other operational meanings for max-lift have been revealed in guessing frameworks [30] and quantitative information flow [31]. In LIP, min-lift and max-lift are bounded below and above by thresholds

e^{- ε}

and

e^{ε}

, respectively, to restrict the adversary’s knowledge gain, denoted by

ε

-LIP. The main privacy mechanisms to achieve

ε

-LIP are the watchdog mechanism [24,25] and optimal random response (ORR) [28]. Watchdog mechanism bipartitions the alphabet of X into low-risk and high-risk symbols, and only high-risk ones are randomised. It was proved in [25] that X-invariant randomisation (e.g., merging all high-risk symbols) minimises privacy leakage for the watchdog mechanism. ORR is an optimal mechanism for

ε

-LIP, which maximises MI as the utility measure.

Contributions

We investigate lift and its related privacy notions such as LIP. We demonstrate that min-lift and max-lift have distinct values and probability of appearance in the dataset. More specifically, min-lifts have a broader range of values than max-lifts, while max-lifts have a higher likelihood

P_{S Y} (s, y)

of appearing in the dataset. We call this property lift asymmetry. However,

ε

-LIP allocates symmetric privacy budgets to

{min}_{s} i (s, y)

and

{max}_{s} i (s, y)

(

- ε

and

ε

, respectively), which is incompatible with the lift asymmetry. Thus, we propose asymmetric-LIP (ALIP) as an amenable privacy notion to the lift properties, where asymmetric privacy budgets can be allocated to

{min}_{s} i (s, y)

and

{max}_{s} i (s, y)

, denoted by

- ε_{l}

and

ε_{u}

, respectively. We demonstrate that ALIP implies

ε

-LDP and can result in better utility than LIP in the watchdog and ORR mechanisms. Utility increases by relaxing the bound on the min-lift, which has a lower probability of appearance in the dataset.

We propose two randomization methods to overcome the low utility of the watchdog mechanism and the high complexity of the ORR mechanism. In the watchdog mechanism, X-invariant randomisation perturbs all high-risk symbols together and deteriorates data resolution and utility. On the other hand, ORR suffers from high complexity, which is exponential in the size of datasets. To overcome these problems, we propose subset merging and subset random response (SRR) perturbation methods that make finer subsets of high-risk symbols and privatise each subset separately. Subset merging enhances utility in the watchdog mechanism by applying X-invariant randomisation to disjoint subsets of high-risk symbols. In addition, SRR relaxes the complexity of ORR for large datasets by applying random response solutions on disjoint subsets of high-risk symbols, which results in near-optimal utility.

Besides LIP, we also consider some recently proposed privacy measures, which we call lift-based measures, including

ℓ_{1}

-norm [32],

χ^{2}

-strong privacy [33], and

α

-lift [34]. They have been proposed as the privacy notions stronger than their corresponding average leakages: the total variation distance [35],

χ^{2}

-divergence [36], and Sibson MI [16,34], respectively. We clarify that they only bound max-lift leakage and can cause significant min-lift leakage. Therefore, we propose a corresponding modified version of these measures to restrict min-lift leakage, which we call lift-inverse measures. We apply lift-based and lift-inverse measures in the watchdog mechanism with subset merging randomisation to investigate their PUT. They result in higher utility than ALIP since they are functions of average lift over sensitive features and, thus, can be considered as relaxations of the max- and min-lift.

2. Preliminaries

2.1. Notation

We use the following notation throughout the paper. Capital letters denote discrete random variables, corresponding capital calligraphic letters denote their finite supports, and lowercase letters denote any of their realisations. For example, a random variable X has the support

X,

and its realisation is

x \in X

. For random variables S and X, we use

P_{S X}

to indicate their joint probability distribution,

P_{S | X}

for the conditional distribution of S given X, and

P_{S}

and

P_{X}

for the marginal distributions. Bold capital and lowercase letters are used for matrices and vectors, respectively, and lowercase letters for the corresponding elements of the vectors, e.g.,

v = {[v_{1}, v_{2}, \dots, v_{n}]}^{T}

. We also use

| \cdot |

for the cardinality of a set, e.g.,

| X |

. We denote the natural logarithm by log and the set of integers

{1, 2, \dots, n}

by

[n]

. The indicator function is shown by

1_{{f}}

, which is 1 when f is true and zero otherwise.

2.2. System Model and Privacy Measures

Consider some useful data intended for sharing and denoted by random variable X with alphabet

X

. It is correlated with some sensitive features S with the alphabet

S

through a discrete joint distribution

P_{S X}

. To protect the sensitive features, a privacy mechanism is applied to generate a sanitised version of X, denoted by Y with the alphabet

Y

. We assume

P_{S}

and

P_{X}

have full support, and

P_{Y | X, S} (y | x, s) = P_{Y | X} (y | x)

, which results in the Markov chain

S - X - Y

.

The main privacy measure is lift (since we assume

P_{S}

and

P_{Y}

have full supports,

l (s, y)

is finite), given in (1). Lift and its logarithm, log-lift, quantify multiplicative information gain on each sensitive feature

s \in S

via accessing

y \in Y

. There are two cases:

l (s, y) > 1 \Rightarrow P_{S | Y} (s | y) > P_{S} (s)

indicates the increment of the belief about s after releasing y;

l (s, y) \leq 1 \Rightarrow P_{S | Y} (s | y) \leq P_{S} (s)

means that releasing y decreases the belief. The more the posterior belief deviates from the prior belief, the more an adversary gains knowledge about s. Thus, for each

y \in Y

, the

{max}_{s} l (s, y)

and

{min}_{s} l (s, y)

determine the highest knowledge gain of sensitive features, and they should be restricted to protect privacy. We use the following notation for these quantities:

Ψ (y) ≜ min_{s \in S} l (s, y) and Λ (y) ≜ max_{s \in S} l (s, y) .

(2)

In Appendix A, we explain the operational meaning of

Ψ (y)

and

Λ (y)

in privacy breach based on the work in [29].

The lift has been applied in local information privacy [24,25,28] to provide protection of sensitive features, and is defined as follows.

Definition 1.

For

ε \in R_{+}

, a privacy mechanism

M : X \to Y

is ε-local information private or ε-LIP, with respect to S, if for all

y \in Y

,

e^{- ε} \leq Ψ (y) and Λ (y) \leq e^{ε} .

(3)

Another instance-wise measure is local differential privacy [5,6,28],

Definition 2.

For

ε \in R_{+}

, a privacy mechanism

M : X \to Y

is ε-local differential private or ε-LDP, with respect to S, if for all

s, s^{'} \in S

and all

y \in Y

,

Γ (y) = sup_{s, s^{'} \in S} \frac{P_{Y | S} (y | s)}{P_{Y | S} (y | s^{'})} = \frac{Λ (y)}{Ψ (y)} \leq e^{ε} .

(4)

3. Asymmetric Local Information Privacy

According to (3), LIP restricts the decrement of

log Ψ (y)

and increment of

log Λ (y)

by the symmetric bounds. However, we demonstrate that these metrics have a distinct range of values and probabilities of appearance in the dataset,

P_{S Y} (s, y)

. We plot the histogram of

log Ψ (y)

and

log Λ (y)

for

10^{3}

randomly generated distributions in Figure 1, where

| X | = 17

and

| S | = 5

. In this figure, the range of

log Ψ (y)

is

[- 12, - 0.06]

, much larger than the range of

log Λ (y)

,

[0.02, 1.64]

. Moreover, the maximum probability of

log Ψ (y)

is much lower than the maximum probability of

log Λ (y)

. We refer to these properties as lift asymmetry. Since high values of

| log Ψ (y) |

have a significantly lower probability (for example, in Figure 1, the probability of

| log Ψ (y) | \geq 6

is near zero) than the

log Λ (y)

, we can relax the min-lift privacy by allocating a higher privacy budget to it while applying a stricter bound for the max-lift. Thus, we propose asymmetric local information privacy (ALIP), where we consider different privacy budgets

ε_{l}

and

ε_{u}

for

| log Ψ (y) |

and

log Λ (y)

, respectively.

This will result in the following notion of privacy, which is more compatible with the lift asymmetry property.

Definition 3.

For

ε_{l}, ε_{u} \in R_{+}

, a privacy mechanism

M : X \to Y

is

(ε_{l}, ε_{u})

-asymmetric local information private, or

(ε_{l}, ε_{u})

-ALIP, with respect to S, if for all

y \in Y

,

e^{- ε_{l}} \leq Ψ (y) and Λ (y) \leq e^{ε_{u}} .

(5)

The following proposition indicates how

(ε_{l}, ε_{u})

-ALIP restricts average privacy leakage measures and LDP.

Proposition 1.

If

(ε_{l}, ε_{u})

-ALIP is satisfied, then

1.: $I (S; Y) \leq ε_{u}$ ;
2.: $T (S; Y) \leq \frac{1}{2} (e^{ε_{u}} - 1)$ and $χ^{2} (S; Y) \leq {(e^{ε_{u}} - 1)}^{2}$ ;
3.: $I_{α}^{S} (S; Y) \leq \frac{α}{α - 1} ε_{u}$ and $I_{α}^{A} (S; Y) \leq \frac{α}{α - 1} ε_{u}$ ;
4.: ε-LDP is satisfied where $ε = ε_{l} + ε_{u}$ ;

where

T (S; Y)

is the total variation distance,

χ^{2} (S; Y)

is

χ^{2}

-divergence,

I_{α}^{S} (S; Y)

is Sibson MI, and

I_{α}^{A} (S; Y)

is Arimoto MI.

Proof.

The proof is given in Appendix B. □

Proposition 1-1–3 demonstrate that average measures are bounded with the max-lift privacy budget. In Section 3.1, we show that ALIP can enhance utility via relaxing min-lift

ε_{l} > ε_{u}

, where a smaller upper bound is allocated to the max-lift and average measures in Proposition 1. Proposition 1–4 shows the relationship between

(ε_{l}, ε_{u})

-ALIP and

ε

-LDP. We introduce a variable

λ \in (0, 1)

to have a convenient representation of this relationship as follows: for an LDP privacy budget

ε

, if we set

ε_{l} = λ ε

and

ε_{u} = (1 - λ) ε

, we have

ε_{l} + ε_{u} = ε

. Thus, varying

λ

gives rise to different

(ε_{l}, ε_{u})

-ALIP scenarios within the same budget for

ε

-LDP. If

λ < 0.5

, we have relaxation on the max-lift privacy; if

λ > 0.5

, it implies relaxation on the min-lift privacy. When

λ = 0.5

, we have the symmetric case of

\frac{ε}{2}

-LIP, where

ε_{l} = ε_{u} = \frac{ε}{2}

.

3.1. ALIP Privacy–Utility Trade-Off

In this subsection, we propose a watchdog mechanism based on ALIP and LDP and an asymmetric ORR (AORR) mechanism for ALIP to perturb data and achieve privacy protection. We observe the PUT of ALIP and LDP, where the utility is measured by MI between X and Y,

I (X; Y)

.

3.1.1. Watchdog Mechanism

Watchdog privacy mechanism bipartitions

X

into low-risk and high-risk subsets denoted by

X_{L}

and

X_{H}

, respectively, and only randomises high-risk symbols. In the existing LIP,

X_{L}

and

X_{H}

are determined by symmetric bounds. We propose to use ALIP to obtain

X_{L}

and

X_{H}

:

\begin{matrix} X_{L} ≜ {x \in X : e^{- ε_{l}} \leq Ψ (x) a n d Λ (x) \leq e^{ε_{u}}} a n d X_{H} = X ∖ X_{L} . \end{matrix}

(6)

For LDP,

X_{L}

and

X_{H}

are given by

\begin{matrix} X_{L} ≜ {x \in X : Γ (x) \leq e^{ε}} a n d X_{H} = X ∖ X_{L} . \end{matrix}

(7)

After obtaining

X_{L}

and

X_{H}

, the privacy mechanism will be

M = \{\begin{matrix} 1_{{x = y}}, & x, y \in X_{L} = Y_{L}, \\ r (y | x), & x \in X_{H}, y \in Y_{H}, \\ 0, & o t h e r w i s e, \end{matrix}

(8)

where

1_{{x = y}}

indicates the publication of low-risk symbols without alteration, and

r (y | x)

is the randomisation on high-risk symbols, where

\sum_{y \in Y_{H}} r (y | x) = 1

.

An instance of

r (y | x)

is the X-invariant randomisation, if

r (y | x) = R (y)

for

x \in X_{H}, y \in Y_{H}

, and

\sum_{y \in Y_{H}} R (y) = 1

. An example of

R (y)

is the uniform randomisation

R (y) = \frac{1}{| Y_{H} |}

with the special case of complete merging, where

| Y_{H} | = 1

, and all

x \in X_{H}

are mapped to one super symbol

y^{*} \in Y_{H}

. It was proved in [25] for LIP that X-invariant randomisation minimises privacy leakage in

X_{H}

. Accordingly, if we apply ALIP in the watchdog mechanism, for

X_{H} \neq ⌀

, the minimum leakages over

X_{H}

are

\begin{matrix} {\bar{ε}}_{u} : = max_{s \in S} i (s, X_{H}) = max_{s \in S} log l (s, X_{H}) = max_{s \in S} log \frac{P (X_{H} | s)}{P (X_{H})}, \end{matrix}

(9)

\begin{matrix} {\bar{ε}}_{l} : = |min_{s \in S} i (s, X_{H})| = |min_{s \in S} log l (s, X_{H})| = |min_{s \in S} log \frac{P (X_{H} | s)}{P (X_{H})}|, \end{matrix}

(10)

where

P (X_{H} | s) = \sum_{x \in X_{H}} P_{X | S} (x | s)

and

P (X_{H}) = \sum_{x \in X_{H}} P_{X} (x) .

X-invariant randomisation is also applicable for LDP, and the following theorem shows that it minimises LDP privacy leakage in

X_{H}

.

Theorem 1.

In the LDP watchdog mechanism where

X_{L}

and

X_{H}

are determined according to (7), X-invariant randomisation minimises privacy leakage in

X_{H}

measured by

Γ (y)

in (4).

Proof.

The proof is given in Appendix C. □

In the watchdog mechanism with X-invariant randomisation, the resulting utility measured by MI between X and Y is given by

I (X; Y) = H (X) - \sum_{x \in X_{H}} P_{X} (x) log \frac{P (X_{H})}{P_{X} (x)} .

(11)

In [25], it was verified that

I (X; Y)

in (11) is monotonic in

X_{H}

: if

X_{H}^{^{'}} \subset X_{H}

then

I (X; Y) < I^{^{'}} (X; Y)

, where

I^{^{'}} (X; Y)

is the resulting utility of

X_{H}^{^{'}}

.

Proposition 2.

In the watchdog mechanism with X-invariant randomisation, for a given LDP privacy budget ε,

λ \in (0, 1)

, and ALIP privacy budgets

ε_{l} = λ ε, ε_{u} = (1 - λ) ε

, LDP results in higher utility and than ALIP.

Proof.

Denote the high-risk subset for LDP by

X_{H}^{^{'}}

and for ALIP by

X_{H}

. We have

X_{H}^{^{'}} = {x \in X : \frac{Λ (x)}{Ψ (x)} > e^{ε}} and X_{H} = {x \in X : Λ (x) > e^{(1 - λ) ε} o r Ψ (x) < e^{- λ ε}} .

Based on the remark following (11), it is enough to prove that

X_{H}^{'} \subseteq X_{H}

. If

x \in X_{H}^{^{'}}

, for any given

λ \in (0, 1)

, there are only two possible cases: either

Λ (x) > e^{(1 - λ) ε}

or

Λ (x) \leq e^{(1 - λ) ε}

. If

Λ (x) > e^{(1 - λ) ε}

, then

x \in X_{H}

, and our claim is true. If

Λ (x) \leq e^{(1 - λ) ε}

, we have

x \in X_{H}^{'} \Rightarrow \frac{Λ (x)}{Ψ (x)} > e^{ε} \Rightarrow Ψ (x) < Λ (x) e^{- ε}

. We assumed that

Λ (x) \leq e^{(1 - λ) ε}

; therefore,

Ψ (x) < Λ (x) e^{- ε} \leq e^{(1 - λ) ε} e^{- ε} = e^{- λ ε}

. Since

Ψ (x) < e^{- λ ε}

, we have

x \in X_{H}

. □

This proposition shows the result of applying LDP and ALIP in the watchdog mechanism in terms of the privacy–utility trade-off. While both

ε

-LDP and

(λ ε, (1 - λ) ε)

-ALIP imply the same LDP privacy budget, LDP results in fewer high-risk symbols compared to ALIP. This needs to be considered when one applies the watchdog mechanism to achieve LDP or ALIP privacy. Although having fewer high-risk symbols provides better utility, it may compromise privacy. In other words, when

X_{H}^{'} \subseteq X_{H}

, then the privacy leakage of the partition

{X_{L}^{'}, X_{H}^{'}}

is greater than or equal to the privacy leakage of the partition

{X_{L}, X_{H}}

. As a result, it is possible that

ε

-LDP cannot achieve the desired

(λ ε, (1 - λ) ε)

-ALIP privacy level for a given

λ

.

Watchdog mechanism with X-invariant randomisation is a powerful method with low complexity that can be easily applied to instance-wise measures. However, it significantly degrades the utility [25] because X-invariant randomisation obfuscates all high-risk symbols together to minimise privacy leakage, with the cost of deteriorating data resolution. In Section 4, we propose subset merging randomisation to enhance the utility of the watchdog mechanism.

3.1.2. Asymmetric Optimal Random Response (AORR)

ORR was proposed in [28] as a localised instance-wise replacement of the privacy funnel [12]. It is the solution to the optimal utility problem subject to

ε

-LIP or

ε

-LDP constraints. For ALIP, we propose asymmetric optimal random response (AORR), which is defined as

\begin{matrix} max_{\begin{matrix} P_{X | Y}, P_{Y} \end{matrix}} I (X; Y) \\ s . t . & S - X - Y \\ e^{- ε_{l}} \leq Ψ (y) and Λ (y) \leq e^{ε_{u}}, \forall y \in Y . \end{matrix}

(12)

Privacy constraints in this optimisation problem form a closed, bounded, convex polytope [28]. It has been proved that vertices of this polytope are the feasible candidates that maximise MI and satisfy privacy constraints [7,28,37]. However, the number of vertices grows exponentially in the dimension of the polyhedron, which is

| X | (| X | - 1)

for LDP and

(| X | - 1)

for LIP. This makes ORR computationally cumbersome for large

| X |

. Accordingly, [28] suggests some approaches with lower complexity than ORR to avoid vertex enumeration for the larger sizes of

X

, but this comes at the cost of lower utility.

3.1.3. Numerical Results

Here, we demonstrate the privacy leakage and utility of AORR and the watchdog mechanism under ALIP. For the utility, we use normalised MI (NMI)

NMI = \frac{I (X; Y)}{H (X)} \in [0, 1] .

It is clear that the maximum possible utility is obtained when X is published without randomisation, where

Y = X

and

I (X; Y) = H (X)

. Thus,

I (X; Y) \leq H (X)

and NMI

\leq 1

.

We present numerical results for both synthetic and real datasets using MATLAB. For synthetic data, we randomly generated

10^{3}

distributions for the watchdog mechanism and 100 distributions for the AORR where

| X | = 17

and

| S | = 5

. These distributions were generated by normalising the output of the rand function in MATLAB. For real datasets, we used the Adult dataset [38] and set

S = {relationship}

and

X = {Occupation}

, where

| S | = 5

and

| X | = 15

. In all scenarios,

ε

varies from

0.25

to 8, and we consider three cases for

(ε_{u}, ε_{l})

-ALIP, where

λ \in {0.35, 0.5, 0.65}

,

ε_{l} = λ ε

, and

ε_{u} = (1 - λ) ε

. The results of the watchdog mechanism are shown in Figure 2 and Figure 3 for synthetic and real data, respectively; while the AORR results are presented in Figure 4 and Figure 5. The figures display NMI,

log ({max}_{y} Λ (y))

(max-lift leakage), and

|log ({min}_{y} Ψ (y))|

(min-lift leakage) versus the LDP privacy budget

ε

for real data, and the mean values of the same quantities are shown for synthetic data.

In Figure 2a and Figure 3a, we observe that in the watchdog mechanism, LDP provides higher utility and leakage than ALIP for all values of

ε

and

λ

, which confirms Proposition 2. Figure 2a, Figure 3a, Figure 4a and Figure 5a demonstrate that the min-lift relaxation,

λ = 0.65

, enhances utility in the watchdog and AORR mechanisms for

ε > 1

. Note that in all figures,

λ = 0.5

refers to

\frac{ε}{2}

-LIP. On the other hand,

λ = 0.35

results in lower utility. Generally, any value of

λ < 0.5

reduces utility since it strictly bounds the min-lift while relaxing the max-lift. As the min-lift has a wider range of values, achieving this strict bound enlarges the set

X_{H}

and requires randomising more symbols, which reduces utility. Another observation here is that AORR incurs significantly higher utility than the watchdog mechanism. For instance, when

λ = 0.5

and

ε = 2

, the watchdog mechanism results in a utility of

0.52

for synthetic data and

0.73

for the real data, while AORR has a utility of

0.94

and

0.96

for the synthetic and real data, respectively. AORR finds the optimal utility, which, due to PUT, necessarily results in the highest leakage subject to privacy constraints. However, the watchdog mechanism is a nonoptimal solution that minimises leakage of high-risk symbols to provide strong privacy protection, which deteriorates utility. To solve this drawback of the watchdog mechanism, we propose a subset randomisation method in the following section.

4. Subset Merging in Watchdog Mechanism

The watchdog mechanism with X-invariant randomisation is a low-complexity method that can be easily applied when the privacy measures are symbol-wise. X-invariant randomisation is the optimal privacy protection for the high-risk symbols that minimises privacy leakage in

X_{H}

and necessarily results in the worst data resolution. Thus, in this section, we propose the subset merging algorithm to improve data resolution by randomising disjoint subsets of high-risk symbols and enhancing utility in the watchdog mechanism. In the following, we show that applying X-invariant randomisation to disjoint subsets of

X_{H}

increases the utility.

Let

G_{X_{H}} = {X_{1}, X_{2}, \dots X_{g}}

be a partition of

X_{H}

, where for every

i \in [g]

,

X_{i} \subseteq X_{H}

:

X_{i} \cap X_{j} = ⌀, i \neq j

, and

X_{H} = \cup_{i = 1}^{g} X_{i}

. We randomise each subset

X_{i} \in G_{X_{H}}

by X-invariant randomisation

R_{Y_{i}} (y)

for

x \in X_{i}

and

y \in Y_{i}

, where

\sum_{y \in Y_{i}} R_{Y_{i}} (y) = 1

. The resulting MI between X and Y is

I (X; Y) = H (X) - \sum_{i = 1}^{g} \sum_{x \in X_{i}} P_{X} (x) log \frac{P (X_{i})}{P_{X} (x)} .

(13)

Definition 4.

Assume two partitions,

G_{X_{H}} = {X_{1}, \dots, X_{g}}

and

G_{X_{H}}^{^{'}} = {X_{1}^{^{'}}, \dots, X_{g^{'}}^{^{'}}}

. We say that

G_{X_{H}}^{^{'}}

is a refinement of

G_{X_{H}}

, or

G_{X_{H}}

is an aggregation of

G_{X_{H}}^{^{'}}

, if for every

i \in [g]

,

X_{i} = \cup_{j \in J_{i}} X_{j}^{^{'}}

where

J_{i} \subseteq [g^{'}]

, and

P (X_{i}) = \sum_{j \in J_{i}} P (X_{j}^{^{'}})

(this definition is inspired from [39] (Definition 10)).

If

G_{X_{H}}^{^{'}}

is a refinement of

G_{X_{H}}

, then

I_{G_{X_{H}}} (X; Y) \leq I_{G_{X_{H}}^{^{'}}} (X; Y)

.

Obtaining the optimal

G_{X_{H}}

that maximises utility and satisfies privacy constraints is a combinatorial optimisation problem over all possible partitions of

X_{H}

, which is cumbersome to solve. Therefore, we propose a heuristic method in the following.

4.1. Greedy Algorithm to Make Refined Subsets of High-Risk Symbols

In Algorithm 1, we propose a bottom-up algorithm that constitutes a partition of

X_{H}

by merging high-risk symbols in disjoint subsets. It works based on a leakage risk metric for each

x \in X_{H}

:

ω (x) = Λ (x) + Ψ (x)

for ALIP and

ω (x) = Γ (x)

for LDP. For LIP,

ω (x) = max {log Λ (x), | log Ψ (x) |}

. This metric is used to order the subsets by the privacy risk level. Accordingly, to constitute a subset

X_{i} \subseteq X_{H}

, Algorithm 1 bootstraps from the highest risk symbol

X_{i} = {arg {max}_{x \in X_{H}} ω (x)}

(line 5). Then, it merges a symbol

x^{*}

with

X_{i}

that minimises

ω (X_{i} \cup x^{*})

(line 7), as long as the privacy constraints are satisfied in

X_{i}

(line 6). The ALIP privacy constraints for a subset

X_{i}

are given by

e^{- ε_{l}} \leq Ψ (X_{i}) a n d Λ (X_{i}) \leq e^{ε_{u}},

(14)

where

Ψ (X_{i}) = {min}_{s \in S} \frac{\sum_{x \in X_{i}} P_{X | S} (x | s)}{\sum_{x \in X_{i}} P_{X} (x)}

and

Λ (X_{i}) = {max}_{s \in S} \frac{\sum_{x \in X_{i}} P_{X | S} (x | s)}{\sum_{x \in X_{i}} P_{X} (x)}

. For LDP constraint, we have

Γ (X_{i}) = \frac{Λ (X_{i})}{Ψ (X_{i})} \leq e^{ε}

. In Algorithm 1, we used ALIP privacy constraints for the while loops condition in lines 4, 6, and 12. For LDP, the privacy constraint is changed to

Γ (X_{Q}) > ε

, and

ω (x)

for LDP is applied. After the constitution of the partition

G_{X_{H}}

, the last subset

X_{g}

may not meet privacy constraints. Therefore, the leakage of

X_{g}

is checked (line 12), and if there is a privacy breach, an agglomerate

X_{g}

is made by merging other subsets to it that minimises subset risk,

ω (X_{g}) = Λ (X_{g}) + Ψ (X_{g})

(lines 13–14), until privacy constraints are satisfied.

Algorithm 1: Subset merging in the watchdog mechanism.

4.2. Numerical Results

We show PUT for ALIP and LDP under subset merging randomisation in Figure 6 and Figure 7 for synthetic and real data, respectively, with the same setup for the watchdog mechanism in Section 3.1.3. Compared with the complete merging (Figure 2 and Figure 3), the utility was enhanced significantly for both LDP and ALIP in all scenarios under the same privacy constraint. For instance, consider the symmetric case

λ = 0.5

when

ε = 1

, and compare PUT between the subset and complete merging. Figure 6a and Figure 7a demonstrate a utility value of around

0.73

for the subset merging compared to the utilities of

0.17

and

0.28

for the complete merging in Figure 2a and Figure 3a, which are almost

320 %

and

160 %

utility enhancement. Moreover, as Figure 6b,c and Figure 7b,c illustrate, privacy constraints are satisfied in all cases.

5. Subset Random Response

In the previous section, we showed that subset merging enhances utility in the watchdog mechanism significantly. In this section, we propose a method to decrease the complexity of AORR for large datasets. We adopt AORR for subsets of

X_{H}

to decrease the complexity of AORR for large sets such that random response becomes applicable for typically an order of magnitude larger

X

.

The AORR optimisation problem in (12) is equivalent to the following problem:

\begin{matrix} H (x) - min_{P_{X | Y}, P_{Y}} H (X | Y) \\ s . t . S - X - Y \\ e^{- ε_{l}} \leq Ψ (y) and Λ (y) \leq e^{ε_{u}}, \forall y \in Y . \end{matrix}

(15)

To reduce the complexity of (15), we divide

X

into

X_{L}

and

X_{H}

, similar to the watchdog mechanism, and make a partition

G_{X_{H}} = {X_{1}, X_{2}, \dots X_{g}}

from

X_{H}

. We randomise each subset

X_{i} \in G_{X_{H}}

,

i \in [g]

, separately by a randomisation pair

(Q_{i}, q_{i})

, where

Q_{i}

is a matrix in

R^{| X_{i} | \times | Y_{i} |}

and

q_{i}

is a vector in

R^{| Y_{i} |}

.

The elements of

Q_{i}

and

q_{i}

are given by

\begin{matrix} Q_{i} (x | y) = Pr [X = x | Y = y], x \in X_{i}, y \in Y_{i}, \end{matrix}

(16)

\begin{matrix} q_{i} (y) = Pr [Y = y], y \in Y_{i} . \end{matrix}

(17)

For each

y \in Y_{i}

, we have

\sum_{x \in X_{i}} Q_{i} (x | y) = 1

. Consequently,

H (X | Y) = \sum_{i \in [g]} H_{i} (X | Y)

, where

H_{i} (X | Y) = - \sum_{y \in Y_{i}} q_{i} (y) \sum_{x \in X_{i}} Q_{i} (x | y) log Q_{i} (x | y), i \in [g] .

(18)

This setting turns (15) into g optimisation problems for each subset

X_{i} \in G_{X_{H}},

i \in [g]

as follows:

\begin{matrix} min_{Q_{i}, q_{i}} H_{i} (X | Y) \end{matrix}

(19)

\begin{matrix} s . t . & 0 \leq q_{i} (y), & \forall y \in Y_{i}, \end{matrix}

(20)

\begin{matrix} 0 \leq Q_{i} (x | y), & \forall x \in X_{i}, \forall y \in Y_{i}, \end{matrix}

(21)

\begin{matrix} \sum_{x \in X_{i}} Q_{i} (x | y) = 1, & \forall y \in Y_{i}, \end{matrix}

(22)

\begin{matrix} \sum_{y \in Y_{i}} Q_{i} (x | y) q_{i} (y) = P_{X} (x), & x \in X_{i}, \end{matrix}

(23)

\begin{matrix} e^{- ε_{l}} P_{S} (s) \leq \sum_{x \in X_{i}} P_{S | X} (s | x) Q_{i} (x | y) \leq e^{ε_{u}} P_{S} (s), & \forall s \in S, y \in Y_{i} . \end{matrix}

(24)

The columns of the randomisation matrix

Q_{i}

,

i \in [g]

can be expressed as the members of a convex and bounded polytope

Π_{i}

, which is given by the following constraints:

Π_{i} = \{\begin{matrix} v \in R^{| X_{i} |} : \\ 0 \leq v_{k}, \forall k \in [| X_{i} |], \\ \sum_{k = 1}^{| X_{i} |} v_{k} = 1, \\ e^{- ε_{l}} P_{S} (s) \leq \sum_{x \in X_{i}} P_{S | X} (s | x) v_{k} \leq e^{ε_{u}} P_{S} (s), \forall s \in S, k \in [| X_{i} |] \end{matrix}\} .

(25)

For each

X_{i} \in G_{X_{H}},

i \in [g]

, let

V_{i} = {v_{1}^{i} \dots, v_{M}^{i}}

be the vertices of

Π_{i}

in (25),

H (v_{k}^{i})

be the entropy of each

v_{k}^{i}

for

k \in [M]

,

P_{X}^{i}

be the probability vector of

x \in X_{i}

, and

β^{i}

be the solution to the following optimisation:

\begin{matrix} min_{β^{i} \in R^{M}} \sum_{k = 1}^{M} H (v_{k}^{i}) β_{k}^{i}, \\ s . t . & β_{k}^{i} \geq 0, \forall k \in [M], \\ \sum_{k = 1}^{M} v_{k}^{i} β_{k}^{i} = P_{X}^{i} . \end{matrix}

(26)

Then,

Y_{i}

and

(Q_{i}, q_{i})

are given by

\begin{matrix} Y_{i} = {y : β_{y}^{i} \neq 0}; \end{matrix}

(27)

\begin{matrix} q_{i} (y) = β_{y}^{i} and Q_{i} (. | y) = v_{y}^{i}, y \in Y_{i} . \end{matrix}

(28)

The

(ε_{l}, ε_{u})

-ALIP protocol,

M : X \to Y

, is given by the pair of

(P_{X | Y}, P_{Y})

as follows:

P_{X | Y} = \{\begin{matrix} 1_{{x = y}}, & x, y \in X_{L} = Y_{L}, \\ Q_{i} (x | y), & x \in X_{i}, y \in Y_{i}, i \in [g], \\ 0, & o t h e r w i s e; \end{matrix}

(29)

P_{Y} = \{\begin{matrix} P_{X} (y), & y \in Y_{L}, \\ q_{i} (y), & y \in Y_{i}, i \in [g] . \end{matrix}

(30)

5.1. Algorithm for Subset Random Response

We propose Algorithm 2 to implement AORR for subsets of

X_{H}

, which we call subset random response (SRR). In this algorithm, first, we obtain a partition

G_{X_{H}} = {X_{1}, X_{2}, \dots X_{g}}

of

X_{H}

via Algorithm 1. Then, based on

G_{X_{H}}

, we make another partition

O_{X_{H}}

and find the optimal random response for each subset

X_{i}^{^{'}} \in O_{X_{H}}

(line 5). By obtaining the optimal random responses for all subsets, we obtain a pair

(Q_{i}, q_{i})

for each subset

X_{i}^{^{'}}

, and consequently

P_{X | Y}

and

P_{Y}

, by (29) and (30) (lines 20–23). The while loop in lines 7–14 is for the particular cases when the polytope in (25) is empty for a subset

X_{i}^{'}

. It may occur for strict privacy conditions where the privacy budget is very small. Since we reduce the dimension of the original polytope in (15), it increases the possibility that no feasible random response exists in some cases. Therefore, in such cases, we make a union with other subsets until we have a nonempty polytope. If

| G_{X_{H}} | > 0

, then we make a union with another subset in

G_{X_{H}}

(line 9). If

| G_{X_{H}} | = 0

, it means that

X_{g^{'}}^{'}

is the last subset in

O_{X_{H}}

; therefore, we make a union with previously made subsets in

O_{X_{H}}

and update the index

g^{'}

(line 12). Then, if the condition in lines 17–18 is for the cases where there is no feasible polytope after making a union of all subsets, this means that the problem in (15) cannot be solved. Whenever this occurs, we apply the subset merging mechanism.

The number of polytope vertices in AORR is

n \sim O (exp (| X | - 1))

, and the time complexity is

O (n)

. As

| X |

increases, the complexity of AORR increases exponentially in the

| X | - 1

. In SRR, for each

X_{i} \in G_{X_{H}}

,

i \in [g]

, the number of vertices is

n_{i} \sim O (exp (| X_{i} | - 1))

, and the complexity of SRR is

O ({max}_{i} n_{i})

.

Algorithm 2: Subset random response.

5.2. Numerical Results

Here, we compare the PUT of AORR with SRR in Algorithm 2 and subset merging in Algorithm 1. Figure 8 depicts mean values of utility, leakage, and time complexity for 100 randomly generated distributions where

λ = 0.65

and simulation setup is the same as that in Section 3.1.3. The result of the Adult dataset is also shown in Figure 9.

Figure 8a–c and Figure 9 demonstrate that SRR results in better utility and higher leakage than subset merging, and its PUT is very close to AORR. Figure 8d illustrates the processing time of each mechanism for synthetic data from which we observe that the complexity of AORR and SRR is much higher than the subset merging. Running SRR is less complex than AORR for strict privacy constraints (

ε < 1

) and for

ε > 2.5

. While SRR shows higher complexity for some privacy budgets,

1 \leq ε \leq 2.5

, it has the advantage in high-dimension systems. Figure 10 shows a PUT comparison between SRR and subset merging for synthetic data where

| X | = 200

,

| S | = 15

,

ε \in {1, 1.25, \dots, 8}

, and

λ = 0.5

. This experiment shows that both SRR and subset merging are applicable to large datasets. Obviously, SRR provides better utility (Figure 10a) and higher leakage (Figure 10b,c), which is still below the given budgets

ε_{l}

and

ε_{u}

.

6. Lift-Based and Lift-Inverse Measures

In this section, we consider some recently proposed privacy measures that quantify the divergence between the posterior and prior belief on sensitive features, including

ℓ_{1}

-norm [32], strong

χ^{2}

-privacy criterion [33], and

α

-lift [34]. They have been proposed as a stronger version of their corresponding average measures, which are the total variation distance [35],

χ^{2}

-divergence [40], and Sibson MI [16], respectively. We call them lift-based measures and define them in the following.

Definition 5.

For each

y \in Y

, lift-based privacy measures are defined as follows:

The $ℓ_{1}$ -lift is given by

$Λ_{ℓ_{1}} (y) ≜ \sum_{s \in S} P_{S} (s) | l (s, y) - 1 |,$

(31)

then the total variation distance will be

$T (S; Y) = \frac{1}{2} E_{Y} [Λ_{ℓ_{1}} (Y)] .$
The $χ^{2}$ -lift is given by

$Λ_{χ^{2}} (y) ≜ \sum_{s \in S} P_{S} (s) {(l (s, y) - 1)}^{2},$

(32)

then $χ^{2}$ -divergence will be

$χ^{2} (S; Y) = E_{Y} [Λ_{χ^{2}} (Y)] .$
The α-lift is given by

$Λ_{α}^{S} (y) ≜ {(\sum_{s \in S} P_{S} (s) l {(s, y)}^{α})}^{1 / α},$

(33)

then Sibson MI will be

$I_{α}^{S} (S; Y) = \frac{α}{α - 1} log E_{Y} [Λ_{α}^{S} (Y)] .$

Here, we reveal their relationship with ALIP.

Proposition 3.

If

(ε_{l}, ε_{u})

-ALIP is satisfied, then

1.: $max_{y \in Y} Λ_{ℓ_{1}} (y) \leq e^{ε_{u}} - 1$ ;
2.: $max_{y \in Y} Λ_{χ^{2}} (y) \leq {(e^{ε_{u}} - 1)}^{2}$ ;
3.: $max_{y \in Y} Λ_{α}^{S} (y) \leq e^{ε_{u}}$ .

The proof is given in Appendix D.

Proposition 4.

Let

s_{y} = arg max_{s \in S} [l (s, y)]

and

\bar{y} = arg {max}_{y \in Y} Λ (y)

, then

1.: If $max_{y \in Y} Λ_{ℓ_{1}} (y) \leq ε \Rightarrow max_{y \in Y} Λ (y) \leq ε / P_{S} (s_{\bar{y}}) + 1$ ;
2.: If $max_{y \in Y} Λ_{χ^{2}} (y) \leq ε \Rightarrow max_{y \in Y} Λ (y) \leq \sqrt{ε / P_{S} (s_{\bar{y}})} + 1$ ;
3.: If $max_{y \in Y} Λ_{α}^{S} (y) \leq ε \Rightarrow max_{y \in Y} Λ (y) \leq ε / P_{S} {(s_{\bar{y}})}^{\frac{1}{α}}$ .

Proof.

The proof is given in Appendix E. □

Proposition 3 shows that lift-based measures, similar to their corresponding average leakages, are upper-bounded by the max-lift bound. Proposition 4 indicates that if we bound lift-based measures, they can only restrict the max-lift leakage. Accordingly, if one only applies a lift-based measure to protect privacy, such as in previous works [32,33,34], it may cause significant leakage on the min-lift. Therefore, in the following, we propose lift-inverse measures to bound the min-lift leakage.

6.1. Lift-Inverse Measures

In Propositions 3 and 4, we showed that lift-based measures only bound the max-lift. In this subsection, we present lift-inverse measures to restrict the min-lift leakage,

{min}_{y \in Y} Ψ (y) = {min}_{y \in Y} [{min}_{s \in S} l (s, y)

].

Definition 6.

For lift-based measures in (31) to (33), we replace

l (s, y)

with

\frac{1}{l (s, y)}

and call the resulting quantities lift-inverse measures.

The $ℓ_{1}$ -lift-inverse is given by

$Ψ_{ℓ_{1}} (y) = \sum_{s \in S} P_{S} (s) |\frac{1}{ℓ (s, y)} - 1| .$

(34)
The $χ^{2}$ -lift-inverse is given by

$Ψ_{χ^{2}} (y) ≜ \sum_{s \in S} P_{S} (s) {(\frac{1}{ℓ (s, y)} - 1)}^{2} .$

(35)
The α-lift-inverse is given by

$Ψ_{α}^{S} (y) ≜ {(\sum_{s \in S} P_{S} (s) {(\frac{1}{ℓ (s, y)})}^{α})}^{1 / α} .$

(36)

In the following propositions, we show the relationship between

(ε_{l}, ε_{u})

-ALIP.

Proposition 5.

If

(ε_{l}, ε_{u})

-ALIP is achieved, we have

1.: $max_{y \in Y} Ψ_{ℓ_{1}} (y) \leq e^{ε_{l}} - 1;$
2.: $max_{y \in Y} Ψ_{χ^{2}} (y) \leq {(e^{ε_{l}} - 1)}^{2};$
3.: $max_{y \in Y} Ψ_{α}^{S} (y) \leq e^{ε_{l}} .$

Proof.

The proof is provided in Appendix F. □

Proposition 6.

Let

s_{y} = arg min_{s} l (s, y)

and

\underset{̲}{y} = arg min_{y} [Ψ (y)]

, then

1.: If $max_{y \in Y} Ψ_{ℓ_{1}} (y) \leq ε \Rightarrow min_{y \in Y} Ψ (y) \geq \frac{P_{S} (s_{\underset{̲}{y}})}{ε + P_{S} (s_{\underset{̲}{y}})};$
2.: If $max_{y \in Y} Ψ_{χ^{2}} (y) \leq ε \Rightarrow min_{y \in Y} Ψ (y) \geq \frac{\sqrt{P_{S} (s_{\underset{̲}{y}})}}{\sqrt{ε} + \sqrt{P_{S} (s_{\underset{̲}{y}})}};$
3.: If $max_{y \in Y} Ψ_{α}^{S} (y) \leq ε \Rightarrow min_{y \in Y} Ψ (y) \geq ε^{- 1} P_{S} {(s_{\underset{̲}{y}})}^{\frac{1}{α}} .$

Proof.

The proof is provided in Appendix G. □

Propositions 5 and 6 demonstrate that the aforementioned lift-inverse measures are associated with the min-lift, and bounding them can restrict the min-lift leakage. Since lift-based and lift-inverse measures quantify privacy leakage by a function of lift averaged over sensitive features, they can be regarded as more relaxed measures than the min- and max-lifts.

6.2. PUT and Numerical Results

Optimal randomisations for

ℓ_{1}

-lift and

χ^{2}

-lift privacy, which maximise MI as the utility measure, were proposed in [32,33], respectively. Note that ORR is not applicable to these measures since their privacy constraints are not convex. However, here, we apply the watchdog mechanism with subset merging randomisation to investigate the PUT for lift-based and lift-inverse measures. This application shows that the watchdog mechanism with X-invariant randomisation is a low-complexity method that can be applied to all the aforementioned measures. Moreover, our subset merging algorithm significantly enhances the utility, which is comparable to the optimal solutions.

To apply lift-based and lift-inverse measures to the subset merging algorithm, we replace

Λ (y)

and

Ψ (y)

in (6) and Algorithm 1, with the corresponding lift-based and lift-inverse measures in Definitions 5 and 6, respectively. For example,

X_{L}

and

X_{H}

for the

α

-lift are obtained as

\begin{matrix} X_{L} ≜ {x \in X : Ψ_{α}^{S} (x) \leq e^{ε_{l}} a n d Λ_{α}^{S} (x) \leq e^{ε_{u}}} a n d X_{H} = X \ X_{L} . \end{matrix}

(37)

The privacy risk measure in Algorithm 1 is also given by

ω (x) = Ψ_{α}^{S} (x) + Λ_{α}^{S} (x)

. We compare the PUT of lift-based and lift-inverse privacy with (

ε_{l}, ε_{u}

)-ALIP, where lift-based and lift-inverse measures are bounded as follows:

$ℓ_{1}$ -privacy: $Λ_{ℓ_{1}} (y) \leq e^{ε_{u}} - 1$ and $Ψ_{ℓ_{1}} (y) \leq e^{ε_{l}} - 1, \forall y \in Y .$
$χ^{2}$ -privacy: $Λ_{χ^{2}} (y) \leq {(e^{ε_{u}} - 1)}^{2}$ and $Ψ_{χ^{2}} (y) \leq {(e^{ε_{l}} - 1)}^{2}, \forall y \in Y .$
$α$ -lift-privacy: $Λ_{α}^{S} (y) \leq e^{ε_{u}}$ and $Ψ_{α}^{S} (y) \leq e^{ε_{l}}, \forall y \in Y .$

We apply the subset merging mechanism with the simulation setup in Section 3.1.3. Figure 11 and Figure 12 demonstrate the PUT of ALIP for

λ = 0.5

and

ℓ_{1}

and

χ^{2}

privacy for

λ = {0.5, 0.65}

for synthetic and Adult dataset, respectively. When

λ = 0.5

,

ℓ_{1}

and

χ^{2}

privacy result in higher utility compared to ALIP for all values of

ε

since lift-based and lift-inverse measures are relaxations of max- and min-lift. To observe the effect of the asymmetric scenario, we depict

ℓ_{1}

and

χ^{2}

privacy for

λ = 0.65

. From Figure 11a and Figure 12a, we observe that lift-inverse relaxation (

λ = 0.65

) enhances utility significantly for

ε > 1

, but worsens utility for

ε < 1

. The reason is that when

ε < 1

, lift-inverse privacy constraint in the asymmetric scenario is strict, which requires more symbols to be merged in each subset and causes larger subsets and utility degradation.

A comparison between

α

-lift privacy and ALIP for synthetic data is shown in Figure 13 for

α \in {2, 10, 100}

.

α

-lift privacy is tunable such that when

α = \infty

, it is equivalent to ALIP, and when

α < \infty

, it results in a relaxation scenario. We observe tunable property in Figure 13 where

α = 2

has the highest utility. Moreover, when

α

increases, the PUT of

α

-lift privacy becomes closer to ALIP.

7. Conclusions

In this paper, we studied lift, the likelihood ratio between posterior and prior belief about sensitive features in a dataset. We demonstrated the distinction between the min- and max-lifts in terms of data privacy concerns. We proposed ALIP as a generalised version of LIP to have a more compatible notion of privacy with lift asymmetry. ALIP can enhance utility in the watchdog and ORR mechanisms, two main approaches to achieve lift-based privacy. We proposed two subset randomisation methods to enhance the utility of the watchdog mechanism and reduce ORR complexity for large datasets. We also investigated the existing lift-based measures, showing that they could incur significant leakage on the min-lift. Thus, we proposed lift-inverse measures to restrict the min-lift leakage. Finally, we applied the watchdog mechanism to study the PUT of lift-based and lift-inverse measures. For future work, one can consider the applicable operational meaning of the min-lift and max-lift. Subset randomisation can be applied to decrease the complexity and enhance the utility of other privacy mechanisms. Moreover, optimal randomisation for

α

-lift is also unknown and could be considered.

Author Contributions

Conceptualization, N.D. and P.S.; Methodology, N.D. and P.S.; Validation, P.S.; Formal analysis, M.A.Z.; Writing—original draft, M.A.Z.; Writing—review & editing, N.D. and P.S.; Visualization, M.A.Z.; Supervision, N.D. and P.S.; Project administration, P.S.; Funding acquisition, P.S. All authors have read and agreed to the published version of the manuscript.

Funding

The work of P. Sadeghi and M.A. Zarrabian is supported by the ARC Future Fellowship FT190100429 and partly by the Data61 CRP: IT-PPUB.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

For real-world data simulations we have used public Adult dataset from UCI machine learning repository https://archive-beta.ics.uci.edu/dataset/2/adult.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A

In this appendix, we represent the privacy breach in terms of min-lift and max-lift based on the work of Evfimievsk et al. [29]. In this work, they proposed the following definition for privacy breach and provided a detailed example [29] (Example 1).

Definition A1.

For

ρ_{1}, ρ_{2} \in R_{+} : 0 < ρ_{1} < ρ_{2} < 1

, we say that there is a straight or upward

ρ_{1}

-to-

ρ_{2}

privacy breach with respect to a property

Q (S)

if for some

y \in Y

Pr [Q (S)] \leq ρ_{1}, Pr [Q (S) | Y = y] > ρ_{2} .

We say that there is a downward

ρ_{2}

-to-

ρ_{1}

privacy breach with respect to

Q (S)

if for some

y \in Y

Pr [Q (S)] \geq ρ_{2}, Pr [Q (S) | Y = y] < ρ_{1} .

Using the property

Q^{'} (S) = \neg Q (S)

, we have

Pr [Q^{'} (S)] \leq 1 - ρ_{2}, Pr [Q^{'} (S) | Y = y] \geq 1 - ρ_{1} .

Therefore, a downward

ρ_{2}

-to-

ρ_{1}

privacy breach implies an upward

(1 - ρ_{2})

-to-

(1 - ρ_{1})

for the complement event

Q^{'} (S)

. In the original definition, the constraint for

Pr [Q (S) | Y = y]

includes equality. Here, we remove the equality constraint for consistency with the context. They proved that in the

ε

-LDP scenario, the sufficient condition to prevent both upward

ρ_{1}

-to-

ρ_{2}

and downward

ρ_{2}

-to-

ρ_{1}

privacy breach is

e^{ε} \leq \frac{ρ_{2}}{ρ_{1}} \cdot \frac{1 - ρ_{1}}{1 - ρ_{2}} .

(A1)

Here, we provide sufficient conditions to prevent upward

ρ_{1}

-to-

ρ_{2}

and downward

ρ_{2}

-to-

ρ_{1}

in terms of min- and max-lift as follows:

Proposition A1.

To prevent upward and downward privacy breaches it is sufficient to have:

\frac{1 - ρ_{2}}{1 - ρ_{1}} \leq Ψ (y), Λ (y) \leq \frac{ρ_{2}}{ρ_{1}}, \forall y \in Y .

Proof.

We prove this for an upward privacy breach on a property

Q (S)

and a downward privacy breach on its complement

\neg Q (S)

. For any property

Q (S)

with

Pr [Q (S)] \leq ρ_{1}

we have

\begin{matrix} Pr [Q (S) | Y = y] = \sum_{s \in Q (S)} P_{S | Y} (s | y) = \sum_{s \in Q (S)} \frac{P_{S} (s) . P_{Y | S} (y | s)}{P_{Y} (y)} \leq \sum_{s \in Q (S)} P_{S} (s) max_{s} \frac{P_{Y | S} (y | s)}{P_{Y} (y)} = Λ (y) Pr [Q (S)] . \end{matrix}

Since

Pr [Q (S)] \leq ρ_{1}

, if

Λ (y) \leq \frac{ρ_{2}}{ρ_{1}}

, then

Pr [Q (S) | Y = y] \leq ρ_{2}

and there is no upward privacy breach for

Q (S)

. Similarly, for downward privacy breach on

\neg Q (S)

, we have

\begin{matrix} Pr [\neg Q (S) | Y = y] & = \sum_{s \in \neg Q (S)} P_{S | Y} (s | y) = \sum_{s \in \neg Q (S)} \frac{P_{S} (s) . P_{Y | S} (y | s)}{P_{Y} (y)} \\ \geq \sum_{s \in \neg Q (S)} P_{S} (S) min_{s} \frac{P_{Y | S} (y | s)}{P_{Y} (y)} = Ψ (y) Pr [\neg Q (S)] . \end{matrix}

Since

Pr [\neg Q (S)] \geq 1 - ρ_{1}

, if

Ψ (y) \geq \frac{1 - ρ_{2}}{1 - ρ_{1}}

, then

Pr [\neg Q (S) | Y = y] \geq 1 - ρ_{2}

and there is no downward privacy breach for

\neg Q (S)

. □

Appendix B

For MI, we have

$I (S; Y) = E_{P_{S Y}} [i (S, Y)] \leq E_{P_{S Y}} [ε_{u}] = ε_{u} .$
- For the total variation distance, we have
  
  $\begin{matrix} T (S; Y) & = \frac{1}{2} \sum_{y \in Y} P_{Y} (y) \sum_{s \in S} P_{S} (s) | l (s, y) - 1 | \leq \frac{1}{2} \sum_{y \in Y} P_{Y} (y) \sum_{s \in S} P_{S} (s) | Λ (y) - 1 | \\ = \frac{1}{2} \sum_{y \in Y} P_{Y} (y) | Λ (y) - 1 | \leq \frac{1}{2} \sum_{y \in Y} P_{Y} (y) | e^{ε_{u}} - 1 | = \frac{1}{2} (e^{ε_{u}} - 1) . \end{matrix}$
- For $χ^{2}$ -divergence, we have
  
  $\begin{matrix} χ^{2} (S; Y) & = \sum_{y \in Y} P_{Y} (y) \sum_{s \in S} P_{S} (s) {(l (s, y) - 1)}^{2} \leq \sum_{y \in Y} P_{Y} (y) \sum_{s \in S} P_{S} (s) {(Λ (y) - 1)}^{2} \\ = \sum_{y \in Y} P_{Y} (y) {(Λ (y) - 1)}^{2} \leq \sum_{y \in Y} P_{Y} (y) {(e^{ε_{u}} - 1)}^{2} = {(e^{ε_{u}} - 1)}^{2} . \end{matrix}$
- For Sibson MI, we have
  
  $\begin{matrix} I_{α}^{S} (S; Y) & = \frac{α}{α - 1} log \sum_{y \in Y} P_{Y} (y) {(\sum_{s \in S} P_{S} (s) l {(s, y)}^{α})}^{1 / α} \\ \leq \frac{α}{α - 1} log \sum_{y \in Y} P_{Y} (y) {(\sum_{s \in S} P_{S} (s) Λ {(y)}^{α})}^{1 / α} \\ \leq \frac{α}{α - 1} log \sum_{y \in Y} P_{Y} (y) {(\sum_{s \in S} P_{S} (s) e^{ε_{u} α})}^{1 / α} = \frac{ε_{u} α}{α - 1} . \end{matrix}$
- For Arimoto MI, we have
  
  $\begin{matrix} I_{α}^{A} (S; Y) & = \frac{α}{α - 1} log \sum_{y \in Y} P_{Y} (y) {(\sum_{s \in S} P_{S_{α}} (s) l {(s, y)}^{α})}^{1 / α} \\ \leq \frac{α}{α - 1} log \sum_{y \in Y} P_{Y} (y) {(\sum_{s \in S} P_{S_{α}} (s) Λ {(y)}^{α})}^{1 / α} \\ \leq \frac{α}{α - 1} log \sum_{y \in Y} P_{Y} (y) {(\sum_{s \in S} P_{S_{α}} (s) e^{ε_{u} α})}^{1 / α} = \frac{ε_{u} α}{α - 1}, \end{matrix}$
  
  where $P_{S_{α}} (s) = \frac{P_{S} {(s)}^{α}}{\sum_{s \in S} P_{S} {(s)}^{α}} .$
In LDP, for all $y \in Y$ , we have

$\begin{matrix} Γ (y) = sup_{s, s^{'} \in S} \frac{P_{Y | S} (y | s)}{P_{Y | S} (y | s^{'})} = \frac{{max}_{s \in S} P_{Y | S} (y | s)}{{min}_{s \in S} P_{Y | S} (y | s)} = \frac{{max}_{s \in S} P_{Y | S} (y | s) / P_{Y} (y)}{{min}_{s \in S} P_{Y | S} (y | s) / P_{Y} (y)} = \frac{Λ (y)}{Ψ (y)} \leq \frac{e^{ε_{u}}}{e^{- ε_{l}}} = e^{ε_{l} + ε_{u}} . \end{matrix}$

Appendix C

Here, we prove that X-invariant randomisation minimises privacy leakage in

X_{H}

for LDP.

Proposition A2.

A randomisation

r (y | x)

,

x, y, \in X_{H}

can attain

(ε, X_{H})

-LDP if and only if:

Γ_{L D P} (X_{H}) = \frac{{max}_{s \in S} P (X_{H} | s)}{{min}_{s \in S} P (X_{H} | s)} \leq e^{ε} .

(A2)

Proof.

Sufficient condition: Consider an X-invariant randomization where

r (y | x) = R (y)

,

\forall x \in X_{H}

, and

y \in Y_{H}

. If (A2) holds, then for all

s, s^{'} \in S

, we have

\begin{matrix} \frac{P (X_{H} | s)}{P (X_{H} | s^{'})} & \leq \frac{{max}_{s \in S} P (X_{H} | s)}{{min}_{s \in S} P (X_{H} | s)} \leq e^{ε} \Rightarrow \\ P (X_{H} | s) & \leq P (X_{H} | s^{'}) e^{ε} \Rightarrow \\ R (y) P (X_{H} | s) & \leq R (y) P (X_{H} | s^{'}) e^{ε} \Rightarrow \\ \sum_{x \in X_{H}} r (y | x) P_{X | S} (x | s) & \leq \sum_{x \in X_{H}} r (y | x) P_{X | S} (x | s^{'}) e^{ε} \Rightarrow \\ P_{Y | S} (y | s) & \leq P_{Y | S} (y | s^{'}) e^{ε} . \end{matrix}

For the necessary condition, note that for all

s, s^{'} \in S

and

y \in Y_{H}

, we have

\begin{matrix} P_{Y | S} (y | s) \leq e^{ε} P_{Y | S} (y | s^{'}) \Rightarrow \sum_{x \in X_{H}} r (y | x) P_{X | S} (x | s) \leq e^{ε} \sum_{x \in X_{H}} r (y | x) P_{X | S} (x | s^{'}), \end{matrix}

then by a summation over all

y \in Y_{H}

on both sides, we obtain

\begin{matrix} \sum_{x \in X_{H}} P_{X | S} (x | s) \underset{= 1}{\underset{︸}{\sum_{y \in Y_{H}} r (y | x)}} \leq e^{ε} \sum_{x \in X_{H}} P_{X | S} (x | s^{'}) \underset{= 1}{\underset{︸}{\sum_{y \in Y_{H}} r (y | x)}} \\ \Rightarrow P (X_{H} | s) \leq P (X_{H} | s^{'}) e^{ε} \Rightarrow \frac{P (X_{H} | s)}{P (X_{H} | s^{'})} \leq e^{ε} . \end{matrix}

(A3)

Because (A3) holds for all

s, s^{'} \in S

, we have

max_{s, s^{'} \in S} \frac{P (X_{H} | s)}{P (X_{H} | s^{'})} = \frac{{max}_{s \in S} P (X_{H} | s)}{{min}_{s \in S} P (X_{H} | s)} \leq e^{ε} .

□

Appendix D

For $ℓ_{1}$ -lift, we have

$\begin{matrix} Λ_{ℓ_{1}} (y) = \sum_{s \in S} P_{S} (s) | l (s, y) - 1 | \leq \sum_{s \in S} P_{S} (s) | Λ (y) - 1 | = Λ (y) - 1 \leq e^{ε_{u}} - 1 . \end{matrix}$
For $χ^{2}$ -lift, we have

$\begin{matrix} Λ_{χ^{2}} (y) = \sum_{s \in S} P_{S} (s) {(l (s, y) - 1)}^{2} \leq \sum_{s \in S} P_{S} (s) {(Λ (y) - 1)}^{2} = {(Λ (y) - 1)}^{2} \leq {(e^{ε_{u}} - 1)}^{2} . \end{matrix}$
For the $α$ -lift, we have

$\begin{matrix} Λ_{α}^{S} (y) = {(\sum_{s \in S} P_{S} (s) l {(s, y)}^{α})}^{1 / α} \leq {(\sum_{s \in S} P_{S} (s) Λ {(y)}^{α})}^{1 / α} = Λ (y) \leq e^{ε_{u}} . \end{matrix}$

Appendix E

If

s_{y} = arg max_{s \in S} l (s, y)

, then we have

Λ (y) = l (s_{y}, y)

. Recall that

\bar{y} = arg max_{y \in Y} [Λ (y)]

.

When $max_{y \in Y} Λ_{ℓ_{1}} (y) \leq ε$ , for all $y \in Y$ , we have

$\begin{matrix} Λ_{ℓ_{1}} (y) & = \sum_{s \in S} P_{S} (s) | l (s, y) - 1 | \leq ε \Rightarrow P_{S} (s_{y}) | l (s_{y}, y) - 1 | = P_{S} (s_{y}) (Λ (y) - 1) \leq ε, \end{matrix}$

which results in

$max_{y \in Y} Λ (y) \leq \frac{ε}{P_{S} (s_{\bar{y}})} + 1 .$
When $max_{y \in Y} Λ_{χ^{2}} (y) \leq ε$ , for all $y \in Y$ , we have

$\begin{matrix} Λ_{χ^{2}} (y) & = \sum_{s \in S} P_{S} (s) {(l (s, y) - 1)}^{2} \leq ε \Rightarrow P_{S} (s_{y}) {(l (s_{y}, y) - 1)}^{2} = P_{S} (s_{y}) {(Λ (y) - 1)}^{2} \leq ε, \end{matrix}$

which results in

$max_{y \in Y} Λ (y) \leq \sqrt{\frac{ε}{P_{S} (s_{\bar{y}})}} + 1 .$
When $max_{y \in Y} Λ_{α}^{S} (y) \leq ε$ , for all $y \in Y$ , we have

$\begin{matrix} Λ_{α}^{S} (y) & = {(\sum_{s \in S} P_{S} (s) l {(s, y)}^{α})}^{1 / α} \leq ε \Rightarrow P_{S} (s_{y}) l {(s_{y}, y)}^{α} = P_{S} (s_{y}) Λ {(y)}^{α} \leq ε^{α}, \end{matrix}$

which results in

$max_{y \in Y} Λ (y) \leq \frac{ε}{P_{S} {(s_{\bar{y}})}^{\frac{1}{α}}} .$

Appendix F

Since

(ε_{l}, ε_{u})

-ALIP is satisfied, for all

y \in Y

, we have

e^{- ε_{u}} \leq \frac{1}{l (s, y)} \leq e^{ε_{l}}

and

max_{s} (\frac{1}{l (s, y)}) = \frac{1}{Ψ (y)}

.

For $ℓ_{1}$ -lift-inverse, we have

$\begin{matrix} Ψ_{ℓ_{1}} (y) = \sum_{s \in S} P_{S} (s) |\frac{1}{l (s, y)} - 1| \leq \sum_{s \in S} P_{S} (s) |\frac{1}{Ψ (y)} - 1| = \frac{1}{Ψ (y)} - 1 \leq e^{ε_{l}} - 1 . \end{matrix}$
For $χ^{2}$ -lift-inverse, we have

$\begin{matrix} Ψ_{χ^{2}} (y) & = \sum_{s \in S} P_{S} (s) {(\frac{1}{l (s, y)} - 1)}^{2} \leq \sum_{s \in S} P_{S} (s) {(\frac{1}{Ψ (y)} - 1)}^{2} = {(\frac{1}{Ψ (y)} - 1)}^{2} \leq {(e^{ε_{l}} - 1)}^{2} . \end{matrix}$
For $α$ -lift-inverse, we have

$\begin{matrix} Ψ_{α}^{S} (y) = {(\sum_{s \in S} P_{S} (s) {(\frac{1}{l (s, y)})}^{α})}^{\frac{1}{α}} \leq {(\sum_{s \in S} P_{S} (s) {(\frac{1}{Ψ (y)})}^{α})}^{\frac{1}{α}} = \frac{1}{Ψ (y)} \leq e^{ε_{l}} . \end{matrix}$

Appendix G

If

s_{y} = arg min_{s} l (s, y)

, then we have

Ψ (y) = l (s_{y}, y)

. Recall that

\underset{̲}{y} = arg min_{y} [Ψ (y)] .

When $max_{y \in Y} Ψ_{ℓ_{1}} (y) \leq ε$ , for all $y \in Y$ , we have

$\begin{matrix} Ψ_{ℓ_{1}} (y) & = \sum_{s \in S} P_{S} (s) |\frac{1}{l (s, y)} - 1| \leq ε \Rightarrow P_{S} (s_{y}) |\frac{1}{l (s_{y}, y)} - 1| = P_{S} (s_{y}) (\frac{1}{Ψ (y)} - 1) \leq ε, \end{matrix}$

which results in

$min_{y \in Y} Ψ (y) \geq \frac{P_{S} (s_{\underset{̲}{y}})}{ε + P_{S} (s_{\underset{̲}{y}})} .$
When $max_{y \in Y} Ψ_{χ^{2}} (y) \leq ε$ , for all $y \in Y$ , we have

$\begin{matrix} Ψ_{χ^{2}} (y) & = \sum_{s \in S} P_{S} (s) {(\frac{1}{l (s, y)} - 1)}^{2} \leq ε \Rightarrow P_{S} (s_{y}) {(\frac{1}{l (s_{y}, y)} - 1)}^{2} = P_{S} (s_{y}) {(\frac{1}{Ψ (y)} - 1)}^{2} \leq ε, \end{matrix}$

which results in

$min_{y \in Y} Ψ (y) \geq \frac{\sqrt{P_{S} (s_{\underset{̲}{y}})}}{\sqrt{ε} + \sqrt{P_{S} (s_{\underset{̲}{y}})}} .$
When $max_{y \in Y} Ψ_{α}^{S} (y) \leq ε$ , for all $y \in Y$ , we have

$\begin{matrix} Ψ_{α}^{S} (y) & = {(\sum_{s \in S} P_{S} (s) {(\frac{1}{l (s, y)})}^{α})}^{1 / α} \leq ε \Rightarrow P_{S} (s_{y}) {(\frac{1}{l (s_{y}, y)})}^{α} = P_{S} (s_{y}) {(\frac{1}{Ψ (y)})}^{α} \leq ε^{α}, \end{matrix}$

which results in

$min_{y \in Y} Ψ (y) \geq ε^{- 1} P_{S} {(s_{\underset{̲}{y}})}^{\frac{1}{α}} .$

References

Dwork, C.; McSherry, F.; Nissim, K.; Smith, A. Calibrating Noise to Sensitivity in Private Data Analysis. In Theory of Cryptography; Halevi, S., Rabin, T., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; pp. 265–284. [Google Scholar]
Dwork, C. Differential Privacy. In Proceedings of the 33rd International Colloquium on Automata, Languages and Programming, part II (ICALP 2006), Venice, Italy, 10–14 July 2006; Volume 4052, pp. 1–12. [Google Scholar]
Dwork, C. Differential privacy. In Encyclopedia of Cryptography and Security; Springer: Boston, MA, USA, 2011; pp. 338–340. [Google Scholar] [CrossRef]
Dwork, C.; Roth, A. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 2014, 9, 211–407. [Google Scholar] [CrossRef]
Kasiviswanathan, S.P.; Lee, H.K.; Nissim, K.; Raskhodnikova, S.; Smith, A. What can we learn privately? SIAM J. Comput. 2011, 40, 793–826. [Google Scholar] [CrossRef]
Duchi, J.C.; Jordan, M.I.; Wainwright, M.J. Local Privacy and Statistical Minimax Rates. In Proceedings of the 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, Berkeley, CA, USA, 26–29 October 2013; pp. 429–438. [Google Scholar] [CrossRef]
Kairouz, P.; Oh, S.; Viswanath, P. Extremal mechanisms for local differential privacy. Adv. Neural Inf. Process. Syst. 2014, 4, 2879–2887. [Google Scholar]
Sarwate, A.D.; Sankar, L. A rate-disortion perspective on local differential privacy. In Proceedings of the 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 30 September–3 October 2014; pp. 903–908. [Google Scholar] [CrossRef]
Kalantari, K.; Sankar, L.; Sarwate, A.D. Robust Privacy-Utility Tradeoffs Under Differential Privacy and Hamming Distortion. IEEE Trans. Inf. Forensics Secur. 2018, 13, 2816–2830. [Google Scholar] [CrossRef]
Sankar, L.; Rajagopalan, S.R.; Poor, H.V. Utility-privacy tradeoffs in databases: An information-theoretic approach. IEEE Trans. Inf. Forensics Secur. 2013, 8, 838–852. [Google Scholar] [CrossRef]
Jiang, B.; Li, M.; Tandon, R. Context-aware Data Aggregation with Localized Information Privacy. In Proceedings of the 2018 IEEE Conference on Communications and Network Security (CNS), Beijing, China, 30 May–1 June 2018; pp. 1–9. [Google Scholar] [CrossRef]
Makhdoumi, A.; Salamatian, S.; Fawaz, N.; Médard, M. From the Information Bottleneck to the Privacy Funnel. In Proceedings of the 2014 IEEE Information Theory Workshop (ITW 2014), Hobart, TAS, Australia, 2–5 November 2014; pp. 501–505. [Google Scholar] [CrossRef]
Salamatian, S.; du Pin Calmon, F.; Fawaz, N.; Makhdoumi, A.; Médard, M. Privacy-Utility Tradeoff and Privacy Funnel. 2020. Available online: http://www.mit.edu/~salmansa/files/privacy_TIFS.pdf (accessed on 16 January 2020).
Issa, I.; Kamath, S.; Wagner, A.B. An operational measure of information leakage. In Proceedings of the 2016 Annual Conference on Information Science and Systems (CISS), Princeton, NJ, USA, 16–18 March 2016; pp. 234–239. [Google Scholar]
Issa, I.; Kamath, S.; Wagner, A.B. Maximal leakage minimization for the Shannon cipher system. In Proceedings of the 2016 IEEE International Symposium on Information Theory (ISIT), Barcelona, Spain, 10–15 July 2016; pp. 520–524. [Google Scholar] [CrossRef]
Issa, I.; Wagner, A.B.; Kamath, S. An Operational Approach to Information Leakage. IEEE Trans. Inf. Theory 2020, 66, 1625–1657. [Google Scholar] [CrossRef]
Liao, J.; Kosut, O.; Sankar, L.; Calmon, F.P. A tunable measure for information leakage. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Vail, CO, USA, 17–22 June 2018; pp. 701–705. [Google Scholar]
Du Pin Calmon, F.; Fawaz, N. Privacy against statistical inference. In Proceedings of the 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 1–5 October 2012; pp. 1401–1408. [Google Scholar]
Jiang, B.; Li, M.; Tandon, R. Local Information Privacy with Bounded Prior. In Proceedings of the 2019 IEEE International Conference on Communications (ICC), Shanghai, China, 20–24 May 2019; pp. 1–7. [Google Scholar] [CrossRef]
Seif, M.; Tandon, R.; Li, M. Context Aware Laplacian Mechanism for Local Information Privacy. In Proceedings of the 2019 IEEE Information Theory Workshop (ITW), Visby, Sweden, 25–28 August 2019; pp. 1–5. [Google Scholar] [CrossRef]
Jiang, B.; Li, M.; Tandon, R. Local Information Privacy and Its Application to Privacy-Preserving Data Aggregation. IEEE Trans. Dependable Secur. Comput. 2020, 19, 1918–1935. [Google Scholar] [CrossRef]
Jiang, B.; Seif, M.; Tandon, R.; Li, M. Context-Aware Local Information Privacy. IEEE Trans. Inf. Forensics Secur. 2021, 16, 3694–3708. [Google Scholar] [CrossRef]
Ding, N.; Liu, Y.; Farokhi, F. A Linear Reduction Method for Local Differential Privacy and Log-lift. In Proceedings of the 2021 IEEE International Symposium on Information Theory (ISIT), Melbourne, VIC, Australia, 12–20 July 2021; pp. 551–556. [Google Scholar] [CrossRef]
Hsu, H.; Asoodeh, S.; Calmon, F.P. Information-Theoretic Privacy Watchdogs. In Proceedings of the 2019 IEEE International Symposium on Information Theory (ISIT), Paris, France, 7–12 July 2019; pp. 552–556. [Google Scholar]
Sadeghi, P.; Ding, N.; Rakotoarivelo, T. On Properties and Optimization of Information-theoretic Privacy Watchdog. In Proceedings of the 2020 IEEE Information Theory Workshop (ITW), Riva del Garda, Italy, 11–15 April 2020. [Google Scholar]
Zarrabian, M.A.; Ding, N.; Sadeghi, P.; Rakotoarivelo, T. Enhancing utility in the watchdog privacy mechanism. In Proceedings of the ICASSP 2022—2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Virtual, 7–13 May 2022; pp. 2979–2983. [Google Scholar]
Razeghi, B.; Calmon, F.; Gunduz, D.; Voloshynovskiy, S. On Perfect Obfuscation: Local Information Geometry Analysis. arXiv 2020, arXiv:2009.04157. [Google Scholar]
Lopuhaä-Zwakenberg, M.; Tong, H.; Škorić, B. Data Sanitisation Protocols for the Privacy Funnel with Differential Privacy Guarantees. Int. J. Adv. Secur. 2021, 13, 162–174. [Google Scholar]
Evfimievski, A.; Gehrke, J.; Srikant, R. Limiting Privacy Breaches in Privacy Preserving Data Mining. In Proceedings of the Twenty-Second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS ’03, San Diego, CA, USA, 9–12 June 2003; pp. 211–222. [Google Scholar] [CrossRef]
Saeidian, S.; Cervia, G.; Oechtering, T.J.; Skoglund, M. Pointwise Maximal Leakage. arXiv 2022, arXiv:2205.04935. [Google Scholar]
Fernandes, N.; McIver, A.; Sadeghi, P. Explaining epsilon in differential privacy through the lens of information theory. arXiv 2022, arXiv:2210.12916. [Google Scholar]
Zamani, A.; Oechtering, T.J.; Skoglund, M. Data Disclosure With Non-Zero Leakage and Non-Invertible Leakage Matrix. IEEE Trans. Inf. Forensics Secur. 2022, 17, 165–179. [Google Scholar] [CrossRef]
Zamani, A.; Oechtering, T.J.; Skoglund, M. A Design Framework for Strongly χ²-Private Data Disclosure. IEEE Trans. Inf. Forensics Secur. 2021, 16, 2312–2325. [Google Scholar] [CrossRef]
Ding, N.; Zarrabian, M.A.; Sadeghi, P. α-Information-theoretic Privacy Watchdog and Optimal Privatization Scheme. In Proceedings of the 2021 IEEE International Symposium on Information Theory (ISIT), Melbourne, VIC, Australia, 12–20 July 2021; pp. 2584–2589. [Google Scholar]
Rassouli, B.; Gunduz, D. Optimal utility-privacy trade-off with total variation distance as a privacy measure. IEEE Trans. Inf. Forensics Secur. 2019, 15, 594–603. [Google Scholar] [CrossRef]
Asoodeh, S.; Diaz, M.; Alajaji, F.; Linder, T. Estimation efficiency under privacy constraints. IEEE Trans. Inf. Theory 2018, 65, 1512–1534. [Google Scholar] [CrossRef]
Rassouli, B.; Rosas, F.E.; Gunduz, D. Data Disclosure under Perfect Sample Privacy. arXiv 2019, arXiv:1904.01711. [Google Scholar] [CrossRef]
Becker, B.; Kohavi, R. Adult. UC Irvine Machine Learning Repository. Available online: https://archive-beta.ics.uci.edu/dataset/2/adult (accessed on 5 January 1996).
Liu, Y.; Sadeghi, P.; Arbabjolfaei, F.; Kim, Y.H. Capacity Theorems for Distributed Index Coding. IEEE Trans. Inf. Theory 2020, 66, 4653–4680. [Google Scholar] [CrossRef]
Wang, H.; Vo, L.; Calmon, F.P.; Médard, M.; Duffy, K.R.; Varia, M. Privacy With Estimation Guarantees. IEEE Trans. Inf. Theory 2019, 65, 8025–8042. [Google Scholar] [CrossRef]

Figure 1. Histogram of

log Ψ (y) = {min}_{s} i (s, y)

and

log Λ (y) = {max}_{s} i (s, y)

for

10^{3}

randomly generated distributions, where

| X | = 17

,

| S | = 5

.

Figure 1. Histogram of

log Ψ (y) = {min}_{s} i (s, y)

and

log Λ (y) = {max}_{s} i (s, y)

for

10^{3}

randomly generated distributions, where

| X | = 17

,

| S | = 5

.

Figure 2. Privacy–utility trade-off of the watchdog mechanism with complete merging randomisation for synthetic data under

ε

-LDP and

(ε_{l}, ε_{u})

-ALIP, where

| X | = 17

,

| S | = 5

,

ε_{LDP} \in {0.25, 0.5, 0.75, \dots, 8}

,

λ \in {0.35, 0.5, 0.65}

,

ε_{l} = λ ε

, and

ε_{u} = (1 - λ) ε

.

Figure 2. Privacy–utility trade-off of the watchdog mechanism with complete merging randomisation for synthetic data under

ε

-LDP and

(ε_{l}, ε_{u})

-ALIP, where

| X | = 17

,

| S | = 5

,

ε_{LDP} \in {0.25, 0.5, 0.75, \dots, 8}

,

λ \in {0.35, 0.5, 0.65}

,

ε_{l} = λ ε

, and

ε_{u} = (1 - λ) ε

.

Figure 3. Privacy–utility trade-off of the watchdog mechanism with complete merging randomisation for Adult dataset under

ε

-LDP and

(ε_{l}, ε_{u})

-ALIP, where

S = {relationship}

,

X = {occupation}

,

| X | = 15

,

| S | = 5

,

ε_{LDP} \in {0.25, 0.5, 0.75, \dots, 8}

,

λ \in {0.35, 0.5, 0.65}

,

ε_{l} = λ ε

, and

ε_{u} = (1 - λ) ε

.

Figure 3. Privacy–utility trade-off of the watchdog mechanism with complete merging randomisation for Adult dataset under

ε

-LDP and

(ε_{l}, ε_{u})

-ALIP, where

S = {relationship}

,

X = {occupation}

,

| X | = 15

,

| S | = 5

,

ε_{LDP} \in {0.25, 0.5, 0.75, \dots, 8}

,

λ \in {0.35, 0.5, 0.65}

,

ε_{l} = λ ε

, and

ε_{u} = (1 - λ) ε

.

Figure 4. Privacy–utility trade-off of AORR for synthetic data where

| X | = 17

,

| S | = 5

,

ε_{LDP} \in {0.25, 0.5, 0.75, \dots, 8}

,

λ \in {0.35, 0.5, 0.65}

,

ε_{l} = λ ε

, and

ε_{u} = (1 - λ) ε

.

Figure 4. Privacy–utility trade-off of AORR for synthetic data where

| X | = 17

,

| S | = 5

,

ε_{LDP} \in {0.25, 0.5, 0.75, \dots, 8}

,

λ \in {0.35, 0.5, 0.65}

,

ε_{l} = λ ε

, and

ε_{u} = (1 - λ) ε

.

Figure 5. Privacy–utility trade-off of AORR for Adult dataset where

S = {relationship}

,

X = {occupation}

,

| X | = 15

,

| S | = 5

,

ε_{LDP} \in {0.25, 0.5, 0.75, \dots, 8}

,

λ \in {0.35, 0.5, 0.65}

,

ε_{l} = λ ε

, and

ε_{u} = (1 - λ) ε

.

Figure 5. Privacy–utility trade-off of AORR for Adult dataset where

S = {relationship}

,

X = {occupation}

,

| X | = 15

,

| S | = 5

,

ε_{LDP} \in {0.25, 0.5, 0.75, \dots, 8}

,

λ \in {0.35, 0.5, 0.65}

,

ε_{l} = λ ε

, and

ε_{u} = (1 - λ) ε

.

Figure 6. Privacy–utility trade-off of subset merging randomisation under

ε

-LDP and

(ε_{l}, ε_{u})

-ALIP, where

| X | = 17

,

| S | = 5

,

ε_{LDP} \in {0.25, 0.5, 0.75, \dots, 8}

,

λ \in {0.35, 0.5, 0.65}

,

ε_{l} = λ ε

, and

ε_{u} = (1 - λ) ε

.

Figure 6. Privacy–utility trade-off of subset merging randomisation under

ε

-LDP and

(ε_{l}, ε_{u})

-ALIP, where

| X | = 17

,

| S | = 5

,

ε_{LDP} \in {0.25, 0.5, 0.75, \dots, 8}

,

λ \in {0.35, 0.5, 0.65}

,

ε_{l} = λ ε

, and

ε_{u} = (1 - λ) ε

.

Figure 7. Privacy–utility trade-off of subset merging randomisation for Adult dataset under

ε

-LDP and

(ε_{l}, ε_{u})

-ALIP, where

S = {relationship}

,

X = {occupation}

,

| X | = 15

,

| S | = 5

,

ε_{LDP} \in {0.25, 0.5, 0.75, \dots, 8}

,

λ \in {0.35, 0.5, 0.65}

,

ε_{l} = λ ε

, and

ε_{u} = (1 - λ) ε

.

Figure 7. Privacy–utility trade-off of subset merging randomisation for Adult dataset under

ε

-LDP and

(ε_{l}, ε_{u})

-ALIP, where

S = {relationship}

,

X = {occupation}

,

| X | = 15

,

| S | = 5

,

ε_{LDP} \in {0.25, 0.5, 0.75, \dots, 8}

,

λ \in {0.35, 0.5, 0.65}

,

ε_{l} = λ ε

, and

ε_{u} = (1 - λ) ε

.

Figure 8. Comparison of privacy–utility trade-off and time complexity between AORR, SRR (Algorithm 2), and subset merging (Algorithm 1) for synthetic data, where

| X | = 17

,

| S | = 5

,

ε_{LDP} \in {0.25, 0.5, 0.75, \dots, 8}

,

λ = 0.65

,

ε_{l} = λ ε_{LDP}

, and

ε_{u} = (1 - λ) ε_{LDP}

.

Figure 8. Comparison of privacy–utility trade-off and time complexity between AORR, SRR (Algorithm 2), and subset merging (Algorithm 1) for synthetic data, where

| X | = 17

,

| S | = 5

,

ε_{LDP} \in {0.25, 0.5, 0.75, \dots, 8}

,

λ = 0.65

,

ε_{l} = λ ε_{LDP}

, and

ε_{u} = (1 - λ) ε_{LDP}

.

Figure 9. Comparison of privacy–utility trade-off between AORR, SRR (Algorithm 2), and subset merging (Algorithm 1) for Adult dataset, where

S = {relationship}

,

X = {occupation}

,

| X | = 15

,

| S | = 5

,

ε_{LDP} \in {0.25, 0.5, 0.75, \dots, 8}

,

λ = 0.65

,

ε_{l} = λ ε_{LDP}

, and

ε_{u} = (1 - λ) ε_{LDP}

.

Figure 9. Comparison of privacy–utility trade-off between AORR, SRR (Algorithm 2), and subset merging (Algorithm 1) for Adult dataset, where

S = {relationship}

,

X = {occupation}

,

| X | = 15

,

| S | = 5

,

ε_{LDP} \in {0.25, 0.5, 0.75, \dots, 8}

,

λ = 0.65

,

ε_{l} = λ ε_{LDP}

, and

ε_{u} = (1 - λ) ε_{LDP}

.

Figure 10. Comparison of privacy–utility trade-off and time complexity between SRR (Algorithm 2) and subset merging (Algorithm 1) for synthetic data, where

| X | = 200

,

| S | = 15

,

ε_{LDP} \in {1, 1.25, 1.5, 1.75, \dots, 8}

, and

ε_{l} = ε_{u} = \frac{ε_{LDP}}{2}

.

Figure 10. Comparison of privacy–utility trade-off and time complexity between SRR (Algorithm 2) and subset merging (Algorithm 1) for synthetic data, where

| X | = 200

,

| S | = 15

,

ε_{LDP} \in {1, 1.25, 1.5, 1.75, \dots, 8}

, and

ε_{l} = ε_{u} = \frac{ε_{LDP}}{2}

.

Figure 11. Comparison of privacy–utility trade-off between ALIP,

ℓ_{1}

-privacy, and

χ^{2}

-privacy for synthetic data, where

| X | = 17

,

| S | = 5

,

ε_{LDP} \in {0.25, 0.5, 0.75, \dots, 8}

,

λ \in {0.5, 0.65}

,

ε_{l} = λ ε

, and

ε_{u} = (1 - λ) ε

.

Figure 11. Comparison of privacy–utility trade-off between ALIP,

ℓ_{1}

-privacy, and

χ^{2}

-privacy for synthetic data, where

| X | = 17

,

| S | = 5

,

ε_{LDP} \in {0.25, 0.5, 0.75, \dots, 8}

,

λ \in {0.5, 0.65}

,

ε_{l} = λ ε

, and

ε_{u} = (1 - λ) ε

.

Figure 12. Comparison of privacy–utility trade-off between ALIP,

ℓ_{1}

-privacy, and

χ^{2}

-privacy for Adult dataset where,

S = {relationship}

,

X = {occupation}

,

| X | = 15

,

| S | = 5

,

ε_{LDP} \in {0.25, 0.5, 0.75, \dots, 8}

,

λ \in {0.5, 0.65}

,

ε_{l} = λ ε

, and

ε_{u} = (1 - λ) ε

.

Figure 12. Comparison of privacy–utility trade-off between ALIP,

ℓ_{1}

-privacy, and

χ^{2}

-privacy for Adult dataset where,

S = {relationship}

,

X = {occupation}

,

| X | = 15

,

| S | = 5

,

ε_{LDP} \in {0.25, 0.5, 0.75, \dots, 8}

,

λ \in {0.5, 0.65}

,

ε_{l} = λ ε

, and

ε_{u} = (1 - λ) ε

.

Figure 13. Comparison of privacy–utility trade-off between ALIP and

α

-lift-privacy, where

| X | = 17

,

| S | = 5

,

ε_{LDP} \in {0.25, 0.5, 0.75, \dots, 8}

,

ε_{l} = ε_{u} = \frac{ε_{LDP}}{2}

, and

α \in {2, 10, 100}

.

Figure 13. Comparison of privacy–utility trade-off between ALIP and

α

-lift-privacy, where

| X | = 17

,

| S | = 5

,

ε_{LDP} \in {0.25, 0.5, 0.75, \dots, 8}

,

ε_{l} = ε_{u} = \frac{ε_{LDP}}{2}

, and

α \in {2, 10, 100}

.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zarrabian, M.A.; Ding, N.; Sadeghi, P. On the Lift, Related Privacy Measures, and Applications to Privacy–Utility Trade-Offs. Entropy 2023, 25, 679. https://doi.org/10.3390/e25040679

AMA Style

Zarrabian MA, Ding N, Sadeghi P. On the Lift, Related Privacy Measures, and Applications to Privacy–Utility Trade-Offs. Entropy. 2023; 25(4):679. https://doi.org/10.3390/e25040679

Chicago/Turabian Style

Zarrabian, Mohammad Amin, Ni Ding, and Parastoo Sadeghi. 2023. "On the Lift, Related Privacy Measures, and Applications to Privacy–Utility Trade-Offs" Entropy 25, no. 4: 679. https://doi.org/10.3390/e25040679

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

On the Lift, Related Privacy Measures, and Applications to Privacy–Utility Trade-Offs †

Abstract

1. Introduction

Contributions

2. Preliminaries

2.1. Notation

2.2. System Model and Privacy Measures

3. Asymmetric Local Information Privacy

3.1. ALIP Privacy–Utility Trade-Off

3.1.1. Watchdog Mechanism

3.1.2. Asymmetric Optimal Random Response (AORR)

3.1.3. Numerical Results

4. Subset Merging in Watchdog Mechanism

4.1. Greedy Algorithm to Make Refined Subsets of High-Risk Symbols

4.2. Numerical Results

5. Subset Random Response

5.1. Algorithm for Subset Random Response

5.2. Numerical Results

6. Lift-Based and Lift-Inverse Measures

6.1. Lift-Inverse Measures

6.2. PUT and Numerical Results

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix B

Appendix C

Appendix D

Appendix E

Appendix F

Appendix G

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

On the Lift, Related Privacy Measures, and Applications to Privacy–Utility Trade-Offs^†