Next Article in Journal
An Understanding of the Vulnerability of Datasets to Disparate Membership Inference Attacks
Previous Article in Journal
Cybersecurity in Hospitals: An Evaluation Model
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Differentially Private Block Coordinate Descent for Linear Regression on Vertically Partitioned Data

TNO, Unit ICT, Eemsgolaan 3, 9727DW Groningen, The Netherlands
*
Author to whom correspondence should be addressed.
Alphabetically ordered list of authors.
J. Cybersecur. Priv. 2022, 2(4), 862-881; https://doi.org/10.3390/jcp2040044
Submission received: 12 September 2022 / Revised: 21 October 2022 / Accepted: 3 November 2022 / Published: 9 November 2022
(This article belongs to the Section Privacy)

Abstract

:
We present a differentially private extension of the block coordinate descent algorithm by means of objective perturbation. The algorithm iteratively performs linear regression in a federated setting on vertically partitioned data. In addition to a privacy guarantee, we derive a utility guarantee; a tolerance parameter indicates how much the differentially private regression may deviate from the analysis without differential privacy. The algorithm’s performance is compared with that of the standard block coordinate descent algorithm on both artificial test data and real-world data. We find that the algorithm is fast and able to generate practical predictions with single-digit privacy budgets, albeit with some accuracy loss.

1. Introduction

There are many circumstances where organizations need to use each other’s data to perform tasks, such as data analysis or prediction [1,2,3]. For example, different parties may own different data sets that can be combined for improved predictive performance or inference. When these data contain personal information, the required data exchange can be problematic. In these situations, federated learning can be used to facilitate such collaborations. It is a privacy-preserving technique that keeps the data local during the analysis and ensures that no other party gains access to it. Within the research field dedicated to federated learning, there is increasing attention towards solutions for vertically partitioned data. One speaks of vertically partitioned data when different parties owning different attributes on the same subjects.
An analysis that researchers often seek to perform on vertically partitioned data is regression analysis. Recently, an approach has been presented for this scenario: Block coordinate descent (BCD) [4]. BCD is a promising and fast way to perform federated learning for generalized linear models. One of its strengths is that it avoids computationally expensive cryptographic operations to secure the computations. Although no raw data is shared during BCD, the information that is exchanged can still leak information, as no privacy guarantees are in place. There are several examples of this in the context of federated learning [5,6]. To limit the possibility of information leakage, we supplement BCD with differential privacy (DP).
Differential privacy was introduced by [7] and is widely referred to as the state-of-the art approach to privacy preservation. Essentially, it involves adding uncertainty to the data analysis such that similar data sets will likely lead to similar results. Since its introduction, differential privacy has observed some application, but not yet widespread adoption. One of the reasons for this is that the DP parameters quantifying the privacy guarantees often cannot be made as small as hoped while preserving utility. The result is a noisy learning algorithm with reduced performance that theoretically could reveal information about data in the data set with considerable certainty.
The motivation for this project is twofold. The first is to extend BCD with privacy guarantees to make it applicable in a wider range of use cases. The second motivation is to improve the practicality of DP in realistic use cases. To do so, we make some optimistic choices in our set-up. This means that less noise has to be added and a better performance is obtained. This clearly reduces the amount of protection DP offers. However, we believe that in this way we provide more meaningful privacy guarantees that correspond better to the data analyst’s practice.
  • Related Work
Multiple approaches have been presented to add differential privacy to a linear regression problem in the centralized setting (i.e., with one party) [8,9,10,11]. For a federated setting, there has been more focus on horizontally partitioned data [12,13]. For vertically partitioned data, only a few solutions are known [14,15]. A problem similar to ours is treated in [15], where it is approached using techniques from multi-party computation. Although the algorithm performs well, it does not provide a utility guarantee or expectation.
Utility expectations of differentially private learning have been presented for other applications. Examples can be found in differentially private empirical risk minimization [8,16,17,18]. Since such utility bounds are typically asymptotic, large numbers of iterations are required for such bounds to become reliable. For learning techniques with a small number of iterations, such as ours, these are not practical.
The approaches above apply differential privacy over the entire universe of data sets. The concept of locally sensitive differential privacy has been studied before [19,20,21], albeit under various names.
  • Our Contributions
We introduce DP-BCD, a slightly reformulated version of the block coordinate descent algorithm [4] that has been made differentially private (DP) using objective perturbation [8,22,23,24]. It iteratively performs linear regression on vertically partitioned data. To make this implementation as practical as possible, we use local sensitivity parameters in a particularly small universe of possible data sets, instead of using global upper bounds on some large set of unseen data sets. Furthermore, we introduce a new parameter γ that gives the maximally alowable performance decay per iteration. Before the analysis, the parties agree on a loss scaling, fixing the amount of performance they are willing to sacrifice for more privacy. As a consequence of this, theoretical performance guarantees can be derived. To evaluate its performance, we compare our algorithm with the standard BCD algorithm without differential privacy on synthetic test data, the forest fires data set [25] and the garments industry data [26].
  • Outline
In Section 2, we formulate the federated setting and introduce the fundamental results from regression analysis and differential privacy that we need for our construction. The construction of the DP-BCD algorithm with the main result, Theorem 1, can be found in Section 3.1. In Section 3.2, we compare the performance of DP-BCD with standard BCD and linear regression in the centralized setting. In Section 4, we discuss its performance and some improvements of the algorithm, and we conclude with Section 5.

2. Materials and Methods

This section elaborates on the federated context that motivates and scopes our research. Thereafter, we present the BCD algorithm for training a simple linear regression model and highlight the potential privacy issues. Finally, as a stepping stone for improving the BCD algorithm, we formally introduce differential privacy.

2.1. Federated Context

Federated learning with k parties involves local data sets { X ( j ) | 1 j k } that jointly form a federated data set ( X ( 1 ) , , X ( k ) ) . These data sets are used to jointly train a model, which in this case consists of the joint weights ( β a , β b ) . The essence of federated learning is that the local data set of any party is only accessed by the party itself, ensuring that no other party processes it. So federated learning can be chosen to provide more data confidentiality. However, information about the local data set may still be deduced from the outcomes of the local computation.
Our setup assumes that the data sets and the list of participants is fixed for the entire runtime of the algorithm. Nonetheless, modifications that allow the addition of new subjects or objects to the data sets are conceivable. It may also be possible to add a participant during the protocol. This participant will simply have missed the first few iterations and have not contributed anything there. Participants cannot stop during the protocol without publishing their result so far. Such extensions are out of scope for this work. It should be clear that the results here all assume a fixed list of participants and data sets.
In the rest of the article, we will work in the two-party setup. The algorithms and results can be generalized to the k-party setting in a straightforward manner. The utility results do depend on the number of parties. We assume that two entities, named Alice and Bob, intend to perform an analysis on their joint tabular data. These entities could be researchers, analysts or some organizations. The data is vertically partitioned, meaning that Alice and Bob have complementary data on the same subjects. More specifically, the data X a of Alice and the data X b of Bob, with respective dimensions N × m a and N × m b , both contain N observations that are ordered in the same way. Alice knows the first m a parameters of each observation and Bob knows the other m b = m m a observations. In this setting, the methods for linear regression in the centralized setting [8] provide good differentially private linear regression algorithms.

2.2. Linear Regression

We consider simple linear regression, which is the problem of finding β * , such that
L ( β * ) = min β L ( β ) ,
where the loss L on the the data set X with labels y is given by
L ( β ) : = X β y 2 2 .
The optimal solution to this problem is found by deriving with respect to β and determining its root, yielding
β * = ( X T X ) 1 X T y .

2.3. Block Coordinate Descent

The starting point is the block coordinate descent (BCD) algorithm introduced by [4]. It can be used to train a generalized linear model in a federated setting. In the standard approach, all parties know the label y . The first party tries to create a linear model to predict as much from y as possible from its own data. It hands its prediction to the next player, who tries to improve the prediction as much as possible. This continues until the stopping criterion is met.
A simple modification of the original algorithm communicates the missing parts rather than their own predictions. This has the advantages that only a single party needs to know the true label. This is a common situation in many joint learning problems. For this reason the single label owner variant Algorithm 1 of BCD will be used here.
Algorithm 1 Incremental 2-party block coordinate descent algorithm. The subscript a is for Alice and b for Bob
1:
Alice and Bob initiate β a ( 0 ) 0 and β b ( 0 ) 0 , respectively
2:
Alice initiates v b y
3:
i 0
4:
while stopping criterion is not met do
5:
      player Alice do
6:
          β ˜ a ( X a T X a ) 1 X a T v b
7:
          β a β a + β ˜ a
8:
          v a v b X a β ˜ a
9:
         send v a to Bob
10:
    end player
11:
    player Bob do
12:
        β ˜ b ( X b T X b ) 1 X b T v a
13:
        β b β b + β ˜ b
14:
        v b v a X b β ˜ b
15:
       send v b to Alice
16:
    end player
17:
     i i + 1
18:
end while

2.4. Data Reconstruction

Block coordinate descent is an efficient federated learning algorithm, but can leak information about the used data set. In [4], it is explained that the attackers may reconstruct the used data set up to a rotation. From discussions with the authors of [4], we have learned that the data is better protected than by a rotation. The original data can be approximated within a quantifiable margin of error, depending on the amount of shared intermediate results. Earlier reconstruction attacks suggest that an external attacker with supplementary information might be able to mimic this approach even without access to the intermediate results. Although the design, feasibility and success of such an attack are merely hypothetical, the fact is that at this point, we cannot say to what extent the approach in [4] protects the processed data. This is one of the reasons to study a differentially private version of BCD.
This is an example of the broader problem of data privacy. It is hard, if not impossible, to measure. The reason for this is that typically no optimal attack exists. Since it is hard to know how much some optimized approach may uncover, evaluating the ‘privacy’ of data processing is hard. This is one of the reasons to work with theoretical upper bounds on the amount of information that is leaked. Differential privacy does precisely this. It bounds the certainty an attacker may obtain from the results of any study, regardless of the extra information or computational power the attacker may have.
Example 1.
Assume that an attacker has obtained a small list of possible data sets { ( X ( i ) , y ( i ) ) | 1 i n } , of which one is used in a deterministic study, meaning that the outcome is a function of the dataset and the label. The result of this study is a vector β , solving (1), of weights belonging to a linear model. The attacker can simply test all possible data sets to see which ones generate the optimal weight vector β. In this way, the attacker may determine which data set was used. This shows that deterministic methods cannot provide sufficient privacy guarantees.

2.5. Differential Privacy

We begin with the standard definition of differential privacy and a localized variant [19,20,21]. An algorithm is ( ε , δ ) -DP if it finds similar results for similar data sets with large probability 1 δ . The similarity of the results is described by the privacy budget ε . In practice, this means that an attacker, who sees a certain result from the algorithm, cannot decide which data set was used to generate the result. This implies that records in the data set remain hidden.
Definition 1
(Differential privacy). A randomized mechanism A provides ( ε , δ ) -differential privacy, if for all pairs of data sets x 1 , x 2 X at distance 1 = d ( x 1 , x 2 ) and for any outcome y
P [ A ( x 1 ) = y ] e ε P [ A ( x 2 ) = y ] + δ .
This definition provides guarantees that are unconditional on the knowledge or capabilities of the attacker. Furthermore, the parameters ε and δ can be bounded from above by a variety of composition laws. The most common of these will be discussed in Section 2.6. This allows the data owner to keep track of the maximum amount of data leakage a data set has suffered.
Example 2.
Continuing with Example 1, assume that the list consists of two possible data sets, X 1 and X 2 . Since there is no additional information, they are equally likely to be used. The weight vector β is computed using an ( ε , 0 ) DP algorithm, where both data sets are in the universe of possible data sets. From the Definition 1 of Differential Privacy, it follows that
P [ A ( X 1 ) = β ] e ε P [ A ( X 2 ) = β ] .
This implies that the likelihood that X 1 is used is at most
P [ A ( X 1 ) = β ] P [ A ( X 1 ) = β ] + P [ A ( X 2 ) = β ] e ε 1 + e ε .
The attacker cannot learn the used data set with certainty, regardless of his computational power and additional information.
Definition 2
(Locally sensitive differential privacy). A randomized mechanism A provides ( ε , δ ) -locally sensitive differential privacy in the data set x 1 X , if for all data sets x 2 X at distance 1 = d ( x 1 , x 2 ) and for any outcome y
P [ A ( x 1 ) = y ] e ε P [ A ( x 2 ) = y ] + δ .
Definition 1 holds for all pairs of datasets in the universe X at distance 1 of each other. This implies that the amount of noise added to an analysis of our dataset X may stem from data sets D and D at distance 1 of each other, which are completely different from X and its neighbourhood. In this way, a lot of noise is added to hide the difference between D and D while studying X . Thus, a lot of noise has to be added to hide an absent data point, resulting in a large privacy budget with weak guarantees. Therefore, we choose to sacrifice group composition in order to obtain a closer link between the performed data analysis and the privacy budget. This results in a universe of possible data sets that is chosen with local sensitivity in mind.

Definition of a Distance

Definitions 1 and 2 make it clear that some distance on the universe of data sets must be defined. It is preferable to use concepts that make sense both in the local and the federated context. We use the following definition here. Two data sets are at a distance 1, if the sets of subjects they have data on differ by one. It thus requires suppression of an entire row of the data set. Since the data matrices should be of the same dimensions, this corresponds to having no information on someone and filling an entire row in X ( i ) with zeros. This can be interpreted in the federated view too. It means that both parties remove their information about this subject from their local data. In this case, if both parties train ε -DP locally, this corresponds by simple composition; see Lemma 1, to 2 ε -DP in the federated setting.

2.6. Composition Mechanisms

The learning algorithm described in Section 3.1 consumes δ = 0 and a privacy budget of ε for every learning phase iteration. Using either simple composition [27,28] or advanced composition [29], it is possible to determine the consumed privacy budget for an entire protocol run.
Lemma 1
(Simple composition). Let M i be an ( ε i , δ i )-differentially private algorithm. The sequence of algorithms
A ( x ) = ( M 1 ( x ) , , M T ( x ) )
is ( i = 1 T ε i , i = 1 T δ i )-differentially private.
Lemma 2
(Advanced composition). For every ε > 0 , δ 0 , δ > 0 and T N the class of ( ε , δ )-differentially private mechanisms is ( ε , T δ + δ )-differentially private under T-fold adaptive composition, for
ε = ε 2 T log ( 1 / δ ) + T ε ( e ε 1 ) .
For sufficiently small δ this means that the advanced composition yields better results, if
2 T log ( 1 / δ ) < T ( 2 e ε ) ,
which leads to
T log ( N ) < T log ( 1 / δ ) < T 2 2 ( 2 e ε ) 2 ,
since δ < 1 / N . This means that advanced composition is only beneficial, if a protocol with many iterations and a small privacy budget per iteration is used and ε < log ( 2 ) . For example, with T = 5 iterations and ε = 0.2 , the data set may consist of at most 4 data points for advanced composition to be the better choice. Since BCD does not function with a tiny privacy budget per round, this means that we will only use simple composition.

2.7. Convergence

Since Algorithm 1 is iterative, an end point must be chosen. Typically, one would let the algorithm run until the result has converged, where the standard definition of convergence requires any single player to find a remainder v ( t ) in iteration t that is sufficiently close to a remainder observed before,
v ( t ) v ( s ) B 0 , for 1 s < t .
This method demands the weights to converge. However, the optimal weights may depend heavily on a single data point. It is precisely this dependence that DP tries to cap. Furthermore, when adding noise in each round, the weights will absorb some of this noise, which could lead to a series of increasing remainders, so that convergence may never occur. For these reasons (5) is not an ideal convergence definition.
At each iteration, the loss L ( β ) is minimized. At iteration t, a remainder v ( t ) = v ( t 1 ) X β ( t ) with minimum length is passed on to the next player. However, after a certain number of iterations, the benefit of an additional round will become very small. One may define that convergence is reached when the length of the remainder
v ( t ) 2 2 v ( t 1 ) 2 2 B C
hardly decreases or even increases. The bound B C for this would be defined at the initialization of the training. This definition is not very sophisticated, but it has the added advantage that it is directly related to the loss function, which is the objective of the training algorithm. Furthermore, it is applicable in virtually all situations. For example, it will also work in the case of increasing remainders, which may occur in a differentially private algorithm.
Rather than using convergence as stopping criterion, the experiments described here use a fixed number of T = 5 iterations. This makes the analysis of the algorithm and its performance simpler. Five iterations are much less than typically used in BCD. The reason for this is that extra iterations are expensive in differential privacy.

2.8. Code

The code used in this project is available at https://github.com/JDJ847879/dp-bcd (accessed on 12 September 2022).

3. Results

3.1. Construction of DP-BCD

If an attacker knows what function or (deterministic) computation has been performed on a data set, he may derive information about this data from the outcome. This may allow him to exclude certain data points from the data set, include other specific points or deduce relations that the data set fulfils. One of the options to limit this possibility is to hide precisely which computation has been performed. In objective perturbation [8,22,23,24], it is the loss function that is perturbed, preventing the attacker from knowing what computation was performed.
The algorithm presented here consists of two phases. In the first phase, all parties train a linear model on their local data set. The labels they use for this are the parts missing from the joint prediction. In the second phase, the linear models are put together to form a linear model in the federated setting. This linear model can then be published. There are two potential groups of attackers possible in this setting. During the first phase, it is the group of all other participants. At publication, it is the outside world that receives the jointly trained model. Since the group of all other participants is also part of the outside world, we will only be considering the first group when proving our privacy guarantees.
In this study, we use locally sensitive differential privacy (LSDP), as defined in Definition 2. Based on this, only data sets at distance 1 of a party’s own data set are considered. Besides that, we use a small universe of possible data sets X . It consists only of the actual data set and all data sets obtained by removing one record. We do not include possible data sets with one record more than our data set. In fact, for such a small universe, the conditions of Definitions 1 and 2 coincide.
One may argue that using the small universe based on local sensitivity to reduce the amount of noise needed while lowering the privacy budget, is in vain. This is not the case. In the transition, the privacy guarantee is shifted from absent data points with a high privacy budget to the actual data with a low privacy budget. The privacy budget is the explicit security guarantee that (LS)DP offers and as such is what users look at.
The ambition is to minimize the following 2-party loss function in both an iterative and a federated manner
L = ( y i = 1 2 X ( i ) β ( i ) ) 2 + i = 1 2 ( β ( i ) ) T ( X ( i ) ) T b ( i ) .
This is the 2-party form of (2) with a perturbation term added. A ridge regression term is omitted to perform a cleaner comparison to the original BCD algorithm. However, nothing prevents such a term. In (7), each party’s loss function is perturbed by the dot product of the prediction and a secret vector b ( i ) , known only by party i.
For each party, we write that X ( i ) M N × m i , so there are N observations of m i attributes in this party’s data. It follows from our data assumption that N > m i .
If the vectors b ( i ) would be sampled from a normal distribution such as (8), the perturbation term would have the added benefit that the local and federated perturbation term are of the same form. This would provide a similar perturbation term in the federated and local objective function. To avoid dimensionality problems, a different distribution is used, as explained in Remark 1.
Remark 1.
The vector b R N could be sampled from a normal distribution with density
p n a i v e ( b ) = ε 2 π ξ 2 N / 2 exp [ ε b 2 2 2 ξ 2 ] .
It is clear that the direction of the vector is uniformly sampled from the surface of the N-dimensional sphere. For its length, we want to solve for R
2 Γ ( N / 2 ) 0 R / ( 2 σ ) d r r N 1 e r 2 = p , with p ( 0 , 1 ) ,
which transforms into
1 Γ ( N / 2 ) 0 R 2 / ( 2 σ 2 ) d t t ( N / 2 ) 1 e t = p
and is solved by the inverse lower incomplete gamma function. This is problematic. The high dimension pushes the vector outwards, so that the noise vectors tend to get bigger with increasing number of observations. This leads to noise vectors overwhelming the data and a remainder that is larger than the input label.
As explained before, only the first party needs to know the labels. Afterwards, during iteration t, party j obtains the remainder
v t ( j ) = y i = 1 2 X ( i ) β t ( i ) , where β t ( i ) = s = 1 t β ( s ) ( i ) , for i < j s = 1 t 1 β ( s ) ( i ) , for i j
of the label that is not yet explained by party j 1 . From now on we will suppress the sub- and superscripts when possible. The local solution is given by
0 = X T ( v X β * b ) β * = ( X T X ) 1 X T ( v b ) .
There are two algorithms in use in the protocol. The first is used during the learning phase to communicate the missing part of the labels. It is given by
A l ( X ) = v X β * .
The second is used in the revealing phase and is defined by
A r ( X ) = t = 1 T β ( t ) * ,
where β * is in both cases defined in (9). In the special case of unperturbed learning, i.e., b = 0 , we call this solution β .
We start with the privacy of the learning algorithm A l . We sample b = l · s with s S N 1 uniformly and l with density 2 ε 2 π ξ 2 exp [ ε l 2 2 ξ 2 ] , so that
p ξ , ε ( b ) = Γ ( N 2 ) π N 2 ε 2 π ξ 2 exp [ ε b 2 2 2 ξ 2 ]
Thus, the length of the perturbation vector is normally distributed and its direction is uniformly distributed. This ensures that the length of the perturbation vector is independent of the number of observations. The parameter ξ is the largest allowed value of v o u t for a successful protocol run.
The standard deviation of the length of the perturbation vector is given by ξ / ε , where ε is the privacy budget for the round and
ξ = γ v X β 2 .
The parameter γ > 1 gives the maximally allowable deterioration in performance compared to the unperturbed case. It is a new parameter introduced here. It must be chosen big enough to satisfy in every iteration t
( v i n , ( t ) X i β ( t ) * ) i 2 2 v i n , ( t ) X i β ( t ) * 2 2 ξ ( t ) 2 .
This implies that in each iteration of the protocol, the loss scaling parameter must satisfy
γ sup d ( X , X ˜ ) = 1 m = rk ( X ) = rk ( X ˜ ) { v X β 2 v X ˜ β ˜ 2 , v X ˜ β ˜ 2 v X β 2 } .
Thus, γ represents the cost per round of adding differential privacy to the learning algorithm. It is the multiplier of the loss with respect to the unperturbed case, where b = 0 .
The probability that two databases X 1 , X 2 of full rank at a distance 1 of each other yield the same output vector v o u t = v i n X 1 β 1 * = v i n X 2 β 2 * is, according to (9), given by
P [ A l ( X 1 ) = v o u t ] P [ A l ( X 2 ) = v o u t ] = P [ 0 = X 1 T ( v o u t b 1 ) ] P [ 0 = X 2 T ( v o u t b 2 ) ] = P [ b 1 ker ( X 1 T ) + v o u t ] P [ b 2 ker ( X 2 T ) + v o u t ] = P [ b 1 ker ( X 1 T ) + v 1 , 1 ] P [ b 2 ker ( X 2 T ) + v 2 , 2 ] = exp [ ε 2 ξ 2 v 1 , 1 2 2 ] exp [ ε 2 ξ 2 v 2 , 2 2 2 ] e ε
Here, we have decomposed v o u t = v 1 , ker 1 + v 1 , 1 = v 2 , ker 2 + v 2 , 2 into parts inside the kernel and perpendicular to it. Note that the decomposition for X 1 is different from that for X 2 . For the probabilities, it suffices that
exp [ α ( v + j λ j w j ) 2 ] = exp [ α v 2 ] · j exp [ α λ j 2 ] ,
where { w j } is an orthonormal basis for the kernel. Note that the parts inside the kernel can only stem from v i n . Since both matrices are of full rank, their kernels have the same dimensions and selecting a vector out of them is equally likely. For the perpendicular parts, a standard argument can be used. Using (13), the final inequality follows from
v i , 2 2 v o u t 2 2 = v i n X i β i * 2 2 ξ 2 .
For the revealing phase, a very similar argument works. Instead of the missing labels, it is now the weights that are communicated. The privacy loss for revealing a single β ( t ) * is computed by
P [ A r ( X 1 ) = β ( t ) * ] P [ A r ( X 2 ) = β ( t ) * ] = P [ 0 = X 1 T ( v i n , ( t ) X 1 β ( t ) * b 1 , ( t ) ) ] P [ 0 = X 2 T ( v i n , ( t ) X 2 β ( t ) * b 2 , ( t ) ) ] = P [ b 1 , ( t ) ker ( X 1 T ) + ( v i n , ( t ) X 1 β ( t ) * ) 1 ] P [ b 2 , ( t ) ker ( X 2 T ) + ( v i n , ( t ) X 2 β ( t ) * ) 2 ] exp [ ε 2 ξ ( t ) 2 ( v i n , ( t ) X 1 β ( t ) * ) 1 2 2 ] exp [ ε 2 ξ t 2 ( v i n , ( t ) X 2 β ( t ) * ) 2 2 2 ] e ε .
From simple composition, Lemma 1, it follows that revealing the weights t = 1 T β ( t ) * consumes at most a privacy budget of T ε .
To demand that observations should generate a full rank matrix is a minor demand. If it were not the case, a certain attribute could be predicted perfectly by the other attributes. Hence, it could be removed from the database to generate a full rank matrix again. Furthermore, it is not necessary for the proof to work with full rank matrices. They should only be of equal rank.
The complete 2-party algorithm DP-BCD is shown in Algorithm 2. A generalization to more parties is straightforward.
Algorithm 2 Differentially private 2-party block coordinate descent algorithm
1:
ε > 0 , T N and γ > 1
2:
Alice and Bob initiate β a 0 and β b 0 , respectively.
3:
Alice initiates v b y
4:
for t { 1 , , T } do
5:
    player Alice do
6:
        ξ a = γ v b X a β a 2
7:
        b a p ξ a , ε / ( 2 T )
8:
        β ˜ a ( X a T X a ) 1 X a T ( v b b a )
9:
        β a β a + β ˜ a
10:
      v a v b X a β ˜ a
11:
     if  v a 2 ξ a  then
12:
     send v a to Bob
13:
     else
14:
     abort
15:
     end if
16:
  end player
17:
  player Bob do
18:
      ξ b = γ v a X b β b 2
19:
      b b p ξ b , ε / ( 2 T )
20:
      β ˜ b ( X b T X b ) 1 X b T ( v a b b )
21:
      β b β b + β ˜ b
22:
      v b v a X b β ˜ b
23:
     if  v b 2 ξ b  then
24:
     send v b to Alice
25:
     else
26:
     abort
27:
     end if
28:
   end player
29:
end for
30:
Alice sends β a to Bob.
31:
Bob sends β b to Alice.
32:
Alice and Bob publish ( β a , β b ) .

Utility Bound

During the protocol run, the participants must check in every iteration whether the loss increase is less that a factor γ , as demanded in (13). If this is not the case, the protocol will be aborted by the participants, because a model with sufficient utility cannot be trained. Hence, at every single iteration the sum of squared errors, which is the unperturbed loss, is bounded by
v X β * 2 2 γ 2 v X β 2 2 .
This information can be used in another way. It is directly related to the utility loss and provides an upper bound for the utility loss. In a protocol run with k = 2 parties and T iterations, the sum of squared errors is at most a factor γ 2 k T larger than in the unperturbed case. If we denote with f * the differentially private predictions and with f those without DP, then we observe that the utility measure
R 2 = 1 y f * 2 2 Var y 1 γ 2 k t y f 2 2 Var y = 1 γ 2 k t ( 1 R 2 ) .
This shows that we obtain a utility guarantee along with the privacy guarantee. The additional utility loss is bounded by parameters that can be set before the start of the protocol.
This proves the following theorem.
Theorem 1.
The linear regression of y , held by Alice, against the data ( X a , X b ) can be approximated by Algorithm 2, provided that rk ( X a ) = m a and rk ( X b ) = m b are of full column rank and contain N data points, where N > m a and N > m b . For T N , ε > 0 and γ > 1 it is an ε -differentially private algorithm. Furthermore, the utility is bounded from below by
R 2 1 γ 4 T ( 1 R 2 ) ,
where R 2 is the utility of the block coordinate algorithm without differential privacy (Algorithm 1).

3.2. Experiments on Synthetic Data

In order to quantify the performance of DP-BCD simulations with synthetic data are performed. We use standard normally distributed data and normally distributed β parameters ( μ = 2 , σ = 1.5 ). In the baseline scenario, there are nine predictors, with a correlation of 0.3 , N = 1000 , R 2 = 0.3 , ε = 1 , and γ = 1.2 with two parties. Because preliminary analyses have indicated that five iterations is a favourable cut-off in the trade-off between privacy and noise-accumulation, this is the number of iterations used.
For comparison with this baseline scenario, each of the following factors are varied separately: the sample size N { 100 , 250 , 1000 , 5000 , 10,000 } , the correlation between predictors { 0.1 , 0.3 , 0.5 } , R 2 { 0.1 , 0.3 , 0.8 } , ε { 0.1 , 0.3 , 0.5 , 0.8 , 1.0 , 1.5 , 2.5 , 10 } , and γ { 1.15 , 1.25 , 1.5 , 1.8 , 2 , 2.5 , 3 } . The γ values are chosen big enough to avoid an abortion of the protocol run with high probability. For low values of γ , the algorithm may terminate (see Algorithm 1, because γ is too low. This could lead to an unbalanced comparison between scenarios where the γ is sufficiently high and those where the algorithm could not carry out all iterations for each repetition. Each of the variations is repeated 500 times with the exception of the sample size experiment, which is repeated 100 times per variation. At every iteration, a different data set ( X and y) is generated. In experiments where the privacy parameters ε and γ are varied, different β parameters are generated for every iteration.
To evaluate the utility, two results are considered. These are the R 2 and the β estimates. These outcomes are also generated in the centralized setting and using BCD without differential privacy. Because the results for these two algorithms are practically identical, we only compare it to the centralized results. For several scenarios, we compute the average absolute proportional distance (AAPD) for these β estimates. For r repetitions of a scenario with m predictors, the corresponding AAPD is defined as
AAPD = i = 1 r β * β 1 r m .

3.2.1. Impact of Privacy Parameters

The impact of γ and ε on the β and R 2 estimates is non-linear. We find that γ has a stronger impact on R 2 than ε . From Figure 1b, it can be observed that the bound for R 2 decreases significantly with γ . However, for the synthetic data, the expected decrease is not nearly as steep as its bound. For example, for γ = 3 , the average R 2 is approximately 2.5 , whereas it is bounded by 2.44 × 10 9 . Although the results are in line with Theorem 1, the bound can be almost meaningless for large values of γ .
The β estimates grow closer to the BCD results as ε increases, which is in line with the expectation. Table 1 shows that for ε = 1, the β estimates deviate 47% from the centralized β parameters on average. For γ = 1.15 (the lowest tested value), the β estimates deviate 295% from the centralized setting, but note that this is for ε = 1 , see Table 2. For higher values of ε , the estimates are closer to the centralized β parameters, though still differing by up to 47%. As a reference, the average and median deviation after five iterations for BCD without DP are practically zero.

3.2.2. Impact of R 2

As R 2 increases in the data-generating model, more predictive power is preserved with DP-BCD as well, see Figure 2. The precision and bias with which the β parameters can be estimated are also significantly impacted by R 2 in the data generating model, see Table 3.

3.2.3. Impact of Correlation

The impact of the correlation on the utility of the learned model can be seen in Figure 3. As expected, the average β error increases with the correlation between predictors. This can be observed in the wider sampling distribution in Table 4. This is to be expected for an implementation of DP, since more noise must be added to hide the outliers in the data. For very high correlations, the average β parameters differ as well, which means that the estimates are biased. The R 2 , however, remains unaffected by this parameter, though it is lower than with the BCD algorithm.
As studied by [4], strongly correlated data require more iterations for accurate parameter estimation. In fact, for highly correlated data with over 25 variables, hundreds of iterations can be required for convergence of the weights. In a differential privacy setting, this may consume vast privacy budgets or yield poor results due to noise accumulation.

3.2.4. Impact of Sample Size

Sample size is well known to have a large impact on the performance of differentially private model, see Figure 4. As can be observed from Table 5, the β error steadily decreases with the sample size. Furthermore, the R 2 distribution grows closer to the centralized results.

3.3. Evaluation with Real-World Data

We run experiments with two real-world data sets: a forest fires data set by [25], which was used by [4] and a Garment Industry employee productivity data set by [26]. For both data sets, we computed the average coefficients, using γ = 1.2 , ε = 1 , 10 , T = 5 iterations and repeated the experiment 1000 times. We plot the the 2.5th and 97.5th percentiles and compare this to the parameter estimates for the centralized analysis. In addition, R 2 is computed in every iteration and plotted for the ε values of 0.2, 1, 2, 5, and 10. For both data sets we use two parties.

3.3.1. Evaluation with Forest Fires Data

The forest fires data set contains 517 records with 12 predictors containing meteorological and other information to predict the burned area of forest fires. A total of 27 predictors were used in the regression analysis, with the variables pertaining to the month and day transformed to dummy variables.
The plot in Figure 5 shows a plot similar to Figure 5 of [4] using the same data and parties. We also plot the parameter estimates for the centralized analysis (which [4] was demonstrated to be almost identical to BCD with 450 iterations).
For a relatively small privacy budget of ε = 1 , the average coefficients are similar to those from the centralized setting. For ε = 10 , the distributions are narrower, which is in line with the synthetic data results. The closeness of the sampling distributions to the centralized setting is likely affected by the low correlations between the predictors (with an absolute average and median of 0.08 and 0.05, respectively).
The R 2 values are quite low, due to the fact that the centralized R 2 is only 0.07. Because R 2 values for DP-BCD are generally lower than BCD, all median R 2 values are negative for the forest fires data. The y-axis in Figure 6 is cut off at −0.5, because negative R 2 values are not informative, but that for ε = 1.0 and ε = 2.0 the median R 2 values are −4.07 and −0.94, respectively. Thus, for a centralized model that already has low predictive power, adding differential privacy generally results in a complete loss of predictive power.

3.3.2. Evaluation with Garment Employee Productivity Data

We have also tested the algorithm with data from the Garment Industry by [26]. This data set contains 1197 employee records, with 15 predictors for employee productivity (on a continuous 0–1 scale). We removed the variables wip (to avoid missing values) and date (for a simpler regression problem) and used dummy variables for department and date. This data set also has quite low correlations, with a median correlation of 0.03. Figure 7 depicts the distribution of variables between the parties.
On average, the β estimates are close to the centralized analysis, although they do differ with a single run (see Figure 7). The effect of ε is similar to that for the forest fires analysis. The distribution is narrower for ε = 10 .
With respect to R 2 , Figure 8 depicts that the relation between ε and R 2 is similar to those observed for the synthetic data and forest fires data. However, compared to the forest fires data set, more predictive power is preserved, which is related to centralized R 2 of 0.24, This can be observed from the median R 2 values, which are both higher and closer to the centralized results.

4. Discussion

In this paper, we have constructed and tested the DP-BCD algorithm. We have demonstrated that it comes with a utility bound, that bounds the loss of R 2 as a function of the privacy parameters and the R 2 of the BCD solution. This is a new concept in DP learning. However, in its current form its practical relevance is very small, since the bound is too wide. The lower bound in this form deteriorates quickly with an increasing number of parties and iterations.
The simulations in Section 3.2 and Section 3.3 demonstrate that the weights obtained with DP-BCD are similar to BCD, also for correlated data. Nonetheless, the predictive power is lower, especially for problems with low R 2 . Nonetheless, the median R 2 values observed are similar to the BCD, albeit with larger deviations and some outliers. We find that the predictive power is considerably lower for small values of ε , high values of γ or small sample sizes, provided that γ is chosen big enough not to abort.
Both γ and ε have a strong impact on the predictive power. Therefore, the γ value should be set as low as possible, as there is no benefit to having a high γ . With respect to ε , the algorithm retained predictive power even for single-digit privacy budgets. Though not incorporated in the simulation, the number of parties is also expected to impact R 2 , as it makes the BCD procedure more challenging and has a significant impact on the utility bound.
Unbiased estimation of β parameters is a more challenging task than retaining predictive power. With the current procedure, this is not feasible with the amount of noise required. Particularly for highly correlated variables, the number of iterations may exceed the point where the increased precision as a result of iterations is overshadowed by the accumulated noise. For large sample sizes and large values of ε , it is possible to obtain β parameters similar to the BCD procedure. This was also visualized in the forest fire analysis, where ε = 20 led to parameter estimates closer, though not identical, to the centralized and BCD setting.
A fixed number of iterations has been used in the experiments. In this way, a clearer presentation of the performance of DP-BCD can be given. However, a convergence criterion, as described in Section 2.7, makes it possible to explicitly decide each round whether the improved utility is worth the consumed privacy budget. In this way, algorithms with better performance in terms of privacy budget and utility can be constructed.
The problem of federated linear regression on vertically partitioned data is also studied in [15]. Based on the used techniques, it is our estimate that their solution can provide a higher utility on average, since it only requires a single round of noise addition. On the other hand, the use of secure multiplication techniques will probably lead to a longer learning phase, we expect our solution to be faster. Since we had no access to their code, we can only compare the solutions qualitatively.
The approach chosen in this article may work as well for logistic regression. The BCD algorithm can also be used for logistic regression [4] on vertically partitioned data. It has been demonstrated that objective perturbation works well for logistic regression [23]. However, it is not clear whether it is possible to provide a utility guarantee of a similar nature for logistic regression.

5. Conclusions

In this article, we have presented a differentially private extension of the block coordinate descent algorithm for a single label owner, called DP-BCD. We demonstrate that in scenarios where privacy concerns or regulations limit collaborative opportunities, DP-BCD can be used to enable multi-party collaboration, with strict privacy bounds. The algorithm can be used for linear regression analysis of vertically partitioned data. Our construction applies objective perturbation in combination with a small universe of possible data sets following from local sensitivity. In this way, we are able to generate models with both comparable predictive power as BCD and single digit privacy budgets. Furthermore, the set-up allows for a theoretical utility bound that gives a lower bound for the R 2 of the differentially private version in terms of that of the original algorithm.
The acceptable performance loss of DP-BCD compared to BCD is parametrized by a new parameter γ . It allows parties to agree on both a privacy and a utility goal. A direct consequence of this is that DP-BCD comes with theoretical utility guarantees.
Experiments indicate that DP-BCD performs particularly well in settings where the data has a high R 2 , meaning that the data contains a lot of explanatory power. Furthermore, the low number of iterations used benefits data sets with little correlation. For the real-world data sets, we find that the obtained weights are similar on average, although the R 2 is lower.

Author Contributions

J.d.J.: conceptualization, methodology, formal analysis, writing—original draft preparation. B.K.: conceptualization, software, validation, writing—review and editing. S.K.: investigation, writing—original draft preparation, visualization. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: [https://archive.ics.uci.edu/ml/datasets/Forest+Fires] and [https://archive.ics.uci.edu/ml/datasets/Productivity+Prediction+of+Garment+Employees].

Acknowledgments

The authors would like to thank Ásta Magnúsdóttir and Savvina Daniil for their contributions to the project.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AAPDAverage absolute proportional distance
BCDBlock coordinate descent
DPDifferential privacy
DP-BCDDifferentially private block coordinate descent
LSDPLocally sensitive differential privacy

References

  1. Veugen, T. Secure Multi-party Computation and Its Applications. In Proceedings of the Innovations for Community Services, Delft, The Netherlands, 13–15 June 2022; Springer International Publishing: Berlin/Heidelberg, Germany, 2022; pp. 3–5. [Google Scholar]
  2. Veugen, T.; Kamphorst, B.; van de L’Isle, N.; van Egmond, M.B. Privacy-Preserving Coupling of Vertically-Partitioned Databases and Subsequent Training with Gradient Descent. In Proceedings of the Cyber Security Cryptography and Machine Learning, Be’er Sheva, Israel, 8–9 July 2021; Springer International Publishing: Berlin/Heidelberg, Germany, 2021; pp. 38–51. [Google Scholar]
  3. Sangers, A.; van Heesch, M.; Attema, T.; Veugen, T.; Wiggerman, M.; Veldsink, J.; Bloemen, O.; Worm, D. Secure Multiparty PageRank Algorithm for Collaborative Fraud Detection. In Proceedings of the Financial Cryptography and Data Security, Bay, St. Kitts and Nevis, 18–22 February 2019; Springer International Publishing: Berlin/Heidelberg, Germany, 2019; pp. 605–623. [Google Scholar]
  4. Van Kesteren, E.J.; Sun, C.; Oberski, D.L.; Dumontier, M.; Ippel, L. Privacy-Preserving Generalized Linear Models using Distributed Block Coordinate Descent. arXiv 2019, arXiv:cs.LG/1911.03183. [Google Scholar]
  5. Zhu, L.; Liu, Z.; Han, S. Deep Leakage from Gradients. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Curran Associates Inc.: Red Hook, NY, USA, 2019. [Google Scholar]
  6. Geiping, J.; Bauermeister, H.; Dröge, H.; Moeller, M. Inverting Gradients—How Easy is It to Break Privacy in Federated Learning? In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–12 December 2020; Curran Associates Inc.: Red Hook, NY, USA, 2020. [Google Scholar]
  7. Dwork, C.; McSherry, F.; Nissim, K.; Smith, A. Calibrating Noise to Sensitivity in Private Data Analysis. J. Priv. Confid. 2017, 7, 17–51. [Google Scholar] [CrossRef] [Green Version]
  8. Kifer, D.; Smith, A.; Thakurta, A.; Mannor, S.; Srebro, N.; Williamson, R. Private Convex Empirical Risk Minimization and High-dimensional Regression. In Proceedings of the 25th Annual Conference on Learning Theory (COLT ’12), Edinburgh, UK, 25–27 June 2012; Volume 23. [Google Scholar]
  9. Wang, Y. Revisiting differentially private linear regression: Optimal and adaptive prediction & estimation in unbounded domain. In Proceedings of the Thirty-Fourth Conference on Uncertainty in Artificial Intelligence, Monterey, CA, USA, 6–10 August 2018; Globerson, A., Silva, R., Eds.; AUAI Press: Montreal, QC, Canada, 2018; pp. 93–103. [Google Scholar]
  10. Alabi, D.; McMillan, A.; Sarathy, J.; Smith, A.; Vadhan, S. Differentially Private Simple Linear Regression. Proc. Priv. Enhancing Technol. 2022, 2022, 184–204. [Google Scholar] [CrossRef]
  11. Liu, Y.; Wang, N.; Wang, Z.; Wang, X.; Gao, Y.; Ji, X.; Wei, Z.; Qiao, J. Differentially Private Linear Regression Analysis via Truncating Technique. In Proceedings of the Web Information Systems and Applications: 18th International Conference, WISA 2021, Kaifeng, China, 24–26 September 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 249–260. [Google Scholar] [CrossRef]
  12. Sun, L.; Qian, J.; Chen, X. LDP-FL: Practical Private Aggregation in Federated Learning with Local Differential Privacy. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, Montreal, QC, Canada, 19–27 August 2021; Zhou, Z.H., Ed.; International Joint Conferences on Artificial Intelligence Organization: Montreal, QC, Canada, 2021; pp. 1571–1578. [Google Scholar] [CrossRef]
  13. Wei, K.; Li, J.; Ding, M.; Ma, C.; Yang, H.H.; Farokhi, F.; Jin, S.; Quek, T.Q.S.; Poor, H.V. Federated Learning with Differential Privacy: Algorithms and Performance Analysis. IEEE Trans. Inf. Forensics Secur. 2020, 15, 3454–3469. [Google Scholar] [CrossRef] [Green Version]
  14. Li, X.; Hu, Y.; Liu, W.; Feng, H.; Peng, L.; Hong, Y.; Ren, K.; Qin, Z. OpBoost: A Vertical Federated Tree Boosting Framework Based on Order-Preserving Desensitization. arXiv 2022, arXiv:abs/2210.01318. [Google Scholar]
  15. Xu, D.; Yuan, S.; Wu, X. Achieving Differential Privacy in Vertically Partitioned Multiparty Learning. In Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), Orlando, FL, USA, 15–18 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 5474–5483. [Google Scholar]
  16. Wang, D.; Chen, C.; Xu, J. Differentially Private Empirical Risk Minimization with Non-convex Loss Functions. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; Chaudhuri, K., Salakhutdinov, R., Eds.; Volume 97, pp. 6526–6535. [Google Scholar]
  17. Wang, D.; Ye, M.; Xu, J. Differentially Private Empirical Risk Minimization Revisited: Faster and More General. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Voume 30. [Google Scholar]
  18. Yu, D.; Zhang, H.; Chen, W.; Yin, J.; Liu, T.Y. Gradient Perturbation is Underrated for Differentially Private Convex Optimization. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, Yokohama, Japan, 7–15 January 2021. [Google Scholar]
  19. Soria-Comas, J.; Domingo-Ferrer, J.; Sánchez, D.; Megías, D. Individual Differential Privacy: A Utility-Preserving Formulation of Differential Privacy Guarantees. IEEE Trans. Inf. Forensics Secur. 2017, 12, 1418–1429. [Google Scholar] [CrossRef] [Green Version]
  20. Farias, V.A.E.; Brito, F.T.; Flynn, C.; Machado, J.C.; Majumdar, S.; Srivastava, D. Local Dampening: Differential Privacy for Non-Numeric Queries via Local Sensitivity. Proc. VLDB Endow. 2020, 14, 521–533. [Google Scholar] [CrossRef]
  21. Nissim, K.; Raskhodnikova, S.; Smith, A. Smooth Sensitivity and Sampling in Private Data Analysis. In Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing, San Diego, CA, USA, 11–13 June 2007; Association for Computing Machinery: New York, NY, USA, 2007; pp. 75–84. [Google Scholar]
  22. Chaudhuri, K.; Monteleoni, C.; Sarwate, A.D. Differentially Private Empirical Risk Minimization. J. Mach. Learn. Res. 2011, 12, 1069–1109. [Google Scholar] [PubMed]
  23. Chaudhuri, K.; Monteleoni, C. Privacy-preserving logistic regression. In Proceedings of the NIPS, Vancouver, BC, Canada, 8–11 December 2008. [Google Scholar]
  24. Neel, S.; Roth, A.; Vietri, G.; Wu, Z.S. Oracle Efficient Private Non-Convex Optimization. In Proceedings of the 37th International Conference on Machine Learning, Virtual, 13–18 July 2020. [Google Scholar]
  25. Cortez, P.; Morais, A.d.J.R. A Data Mining Approach to Predict Forest Fires Using Meteorological Data; Associação Portuguesa para aInteligência Artificial (APPIA): Porto, Portugal, 2007. [Google Scholar]
  26. Imran, A.A.; Amin, M.N.; Rifat, M.R.I.; Mehreen, S. Deep Neural Network Approach for Predicting the Productivity of Garment Employees. In Proceedings of the 2019 6th International Conference on Control, Decision and Information Technologies (CoDIT), Paris, France, 23–26 April 2019; IEEE: Piscataway, NJ, USA, 2019. [Google Scholar] [CrossRef]
  27. Dwork, C.; Roth, A. The Algorithmic Foundations of Differential Privacy. In Foundations and Trends in Theoretical Computer Science; Now Publishers: Norwell, MA, USA, 2014. [Google Scholar]
  28. McSherry, F. Privacy Integrated Queries. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data (SIGMOD), Providence, RI, USA, 29 June–2 July 2009; Association for Computing Machinery, Inc.: New York, NY, USA, 2009. [Google Scholar]
  29. Dwork, C.; Rothblum, G.N.; Vadhan, S. Boosting and Differential Privacy. In Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, Las Vegas, NV, USA, 23–26 October 2010; pp. 51–60. [Google Scholar]
Figure 1. R 2 of DP-BCD in artificial test data. (a) R 2 after DP-BCD as a function of ε ; (b) R 2 after DP-BCD as a function of γ .
Figure 1. R 2 of DP-BCD in artificial test data. (a) R 2 after DP-BCD as a function of ε ; (b) R 2 after DP-BCD as a function of γ .
Jcp 02 00044 g001
Figure 2. R 2 after DP-BCD as a function of R 2 in the data-generating model.
Figure 2. R 2 after DP-BCD as a function of R 2 in the data-generating model.
Jcp 02 00044 g002
Figure 3. R 2 after DP-BCD as a function of the correlation between predictors in the data-generating model.
Figure 3. R 2 after DP-BCD as a function of the correlation between predictors in the data-generating model.
Jcp 02 00044 g003
Figure 4. R 2 after DP-BCD as a function of N.
Figure 4. R 2 after DP-BCD as a function of N.
Jcp 02 00044 g004
Figure 5. Centralized parameter estimates (black) for forest fire analysis, with average coefficients and 95% confidence intervals for ε = 1 (orange) and ε = 10 (green), for γ = 1.2, 1000 repetitions. Parties are separated with background shading.
Figure 5. Centralized parameter estimates (black) for forest fire analysis, with average coefficients and 95% confidence intervals for ε = 1 (orange) and ε = 10 (green), for γ = 1.2, 1000 repetitions. Parties are separated with background shading.
Jcp 02 00044 g005
Figure 6. R 2 over ε for the forest fires data set (5 iterations, 100 repetitions, and γ = 1.2 ).
Figure 6. R 2 over ε for the forest fires data set (5 iterations, 100 repetitions, and γ = 1.2 ).
Jcp 02 00044 g006
Figure 7. Centralized parameter estimates (black) for regression analysis with Garment employee productivity data set, with average coefficients and 95% confidence intervals for ε = 1 (orange) and ε = 10 (green), for γ = 1.2 , 1000 repetitions. Parties are separated with background shading.
Figure 7. Centralized parameter estimates (black) for regression analysis with Garment employee productivity data set, with average coefficients and 95% confidence intervals for ε = 1 (orange) and ε = 10 (green), for γ = 1.2 , 1000 repetitions. Parties are separated with background shading.
Jcp 02 00044 g007
Figure 8. R 2 over ε for the Garment employee productivity data set (5 iterations, 1000 repetitions, and γ = 1.2 ).
Figure 8. R 2 over ε for the Garment employee productivity data set (5 iterations, 1000 repetitions, and γ = 1.2 ).
Jcp 02 00044 g008
Table 1. Mean and median proportional absolute error of β estimates compared to centralized setting, over ε after five iterations for γ = 1.2 .
Table 1. Mean and median proportional absolute error of β estimates compared to centralized setting, over ε after five iterations for γ = 1.2 .
ε MeanMedian
0.21.5010.09
1.00.674.37
2.00.473.08
3.00.382.51
5.00.301.94
10.00.211.37
20.00.150.97
Table 2. Mean and median proportional absolute error of β estimates compared to centralized setting, over γ after five iterations for ε = 1.
Table 2. Mean and median proportional absolute error of β estimates compared to centralized setting, over γ after five iterations for ε = 1.
γ MeanMedian
1.152.950.45
1.253.210.49
1.503.860.59
1.804.640.71
2.005.170.79
2.506.510.98
3.007.871.18
Table 3. Mean and median proportional absolute error over R 2 .
Table 3. Mean and median proportional absolute error over R 2 .
R 2 MeanMedian
0.14.720.85
0.33.080.47
0.51.880.33
0.81.020.21
Table 4. Mean and median proportional absolute error over the correlation between predictors.
Table 4. Mean and median proportional absolute error over the correlation between predictors.
CorrelationMeanMedian
0.10.560.27
0.31.060.45
0.53.110.69
Table 5. Mean and median proportional absolute error over N.
Table 5. Mean and median proportional absolute error over N.
NMeanMedian
1007.681.46
2503.820.95
10002.300.47
50000.850.22
10,0000.590.15
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

de Jong, J.; Kamphorst, B.; Kroes, S. Differentially Private Block Coordinate Descent for Linear Regression on Vertically Partitioned Data. J. Cybersecur. Priv. 2022, 2, 862-881. https://doi.org/10.3390/jcp2040044

AMA Style

de Jong J, Kamphorst B, Kroes S. Differentially Private Block Coordinate Descent for Linear Regression on Vertically Partitioned Data. Journal of Cybersecurity and Privacy. 2022; 2(4):862-881. https://doi.org/10.3390/jcp2040044

Chicago/Turabian Style

de Jong, Jins, Bart Kamphorst, and Shannon Kroes. 2022. "Differentially Private Block Coordinate Descent for Linear Regression on Vertically Partitioned Data" Journal of Cybersecurity and Privacy 2, no. 4: 862-881. https://doi.org/10.3390/jcp2040044

Article Metrics

Back to TopTop