Kalman Filter-Based Differential Privacy Federated Learning Method

Yang, Xiaohui; Dong, Zijian

doi:10.3390/app12157787

Open AccessArticle

Kalman Filter-Based Differential Privacy Federated Learning Method

by

Xiaohui Yang

and

Zijian Dong

^*

School of Cyberspace Security and Computer, Hebei University, Baoding 071000, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(15), 7787; https://doi.org/10.3390/app12157787

Submission received: 30 June 2022 / Revised: 25 July 2022 / Accepted: 30 July 2022 / Published: 2 August 2022

(This article belongs to the Special Issue Recent Analysis and Applications of Algorithms, Programs and Data Based on Artificial Intelligence)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The data privacy leakage problem of federated learning has attracted widespread attention. Using differential privacy can protect the data privacy of each node in the federated learning, but adding noise to the model parameters will reduce the accuracy and convergence efficiency of the model. A Kalman Filter-based Differential Privacy Federated Learning Method (KDP-FL) has been proposed to solve this problem, which reduces the impact of the noise added on the model by Kalman filtering. Furthermore, the effectiveness of the proposed method is verified in the case of both Non-IID and IID data distributions. The experiments show that the accuracy of the proposed method is improved by 0.3–4.5% compared to differential privacy federated learning.

Keywords:

federated learning; privacy protection; differential privacy; Kalman filter

1. Introduction

With the development of industry and the improvement in computing power, the application of artificial intelligence in various fields of people’s lives has been becoming more popular, affecting many aspects, including medical care, logistics, face and voice recognition, etc. There is a wide application of machine learning and profound learning benefits from computing power and massive data. The traditional machine learning and deep learning model rely on data centers and require large amounts of high-quality data. With the increasing requirements of machine learning and deep learning in our lives, the requirements for data quantity and quality are also higher. In addition, in production and life, other companies and organizations need to pay to collect massive amounts of data. Aside from the cost, protecting users’ privacy is also an important factor limiting the data obtained. To address these issues, Google proposed federated learning [1,2,3], which has attracted attention from scholars and organizations. Federated learning can train models using decentralized data, which significantly increases the amount and quality of data. On the other hand, the participating nodes are not required to provide their own local data in federated learning. They only need to deliver the training results to protect the user’s data privacy. However, federated learning does not make the user’s privacy completely invulnerable, and the user’s privacy may be obtained through the shared model parameters [4,5].

Two schemes have been proposed to improve privacy protection in federated learning, one based on homomorphic encryption [6,7,8,9] and the other based on differential privacy [10,11,12,13]. However, these methods have difficulties in practical application due to their limitations. Differential privacy relies on adding noise to the original data or training results to protect privacy, and it may reduce the model’s accuracy. Too much noise can make the model less accurate than expected. At the same time, too little noise will not achieve the protection of user privacy, making it challenging to balance privacy protection and accuracy [8]. Homomorphic encryption techniques rely on complex encryption and decryption computations, which may cost a large amount of time. At the same time, the different computing power of devices and the different time used in complex calculations situations can make the aggregation unsynchronized [14].

To solve the above problems of federated learning, Kalman filter-based differential privacy federated learning (KDP-FL) is proposed. The proposed scheme consists of two parts, using differential privacy to protect the gradient privacy in federated learning and using the Kalman filter to reduce the noise added to preserve privacy in order to improve the accuracy of the global model. The main contributions of this paper are as follows.

(1): A multiparty cooperative model training scheme that protects data privacy is proposed.
(2): A differential privacy federated learning algorithm is proposed to protect user data privacy by a differential privacy mechanism.
(3): The noise introduced by the differential privacy mechanism is reduced by Kalman filtering to improve the model’s accuracy.

2. Related Work

Federated learning has been gradually applied to protect the data privacy of all parties in scenarios where multiple parties cooperate in training and need to exchange data, such as traffic monitoring [15], edge computing [14], and IoT [16], to ensure that the original data is not leaked. However, privacy analysis attacks mean that federated learning also faces the risk of privacy leakage. It is essential to solve the problems of federated learning, especially gradient privacy protection, where data privacy is a particular concern. There are currently two privacy-preserving methods in federated learning: (1) differential privacy and (2) homomorphic encryption.

Abadi et al. [10] proposed a differential privacy deep learning method that combined machine learning or deep learning methods with differential privacy and proposed a new computational theory to obtain lower privacy consumption with smaller constraints, providing a basis for introducing differential privacy in federated learning. Hu et al. [11] used a block coordinate descent to optimize the objective function and differential privacy, which could protect gradient privacy but performed poorly in non-convex models.

Zhao et al. [12] used differential privacy federated learning for vehicle nets to train intelligent traffic management systems to protect vehicle location privacy but did not evaluate the accuracy under the influence of noise. Homomorphic encryption is a cryptographic method based on the computational complexity theory of mathematical puzzles, where the model parameters computed by each training node are aggregated and sent to the specified node after homomorphic encryption, and the model parameters obtained after decryption are the same as those obtained by direct aggregation without encryption, using this property to protect gradient privacy. Li et al. [13] used the chain aggregation method to protect gradient privacy, which improved the accuracy, but the chain aggregation approach introduced a large communication overhead. Fang et al. [8] used the Pilliar algorithm combined with federated learning to protect gradient privacy, but the Pilliar algorithm requires complex power operations, which introduced a large time overhead. Ma et al. [9] used the multi-key homomorphic encryption (MK-HE) to solve the collusion attack leakage problem brought by a single key, which improved privacy protection for federated learning. Zhang et al. [7] improved the encoding method during gradient transmission, reducing the computational overhead due to encryption but decreasing the accuracy.

To balance accuracy and differential privacy, this paper uses a differential privacy approach to protect gradient privacy while using Kalman filtering to reduce the impact of the noise on the accuracy.

3. Proposed Method

This section introduces the Kalman filtering-based differential privacy federated learning method, which contains a Differential Privacy mini-Batch Gradient Descent Algorithm and the Kalman reduction parameters aggregation method.

3.1. Method Overview

There are two types of nodes in federated learning, the Request Node (RN) and the Training Node (TN). The nodes that need others to assist in training the model are called the RN, which deliver task requests to other nodes and initialize or update the global model. The nodes that assist the RN in training the model are called the TN, which train the model using local data and provide the training results, which is also called the gradient, to the RN. There are C TNs participating in training each round. The workflow of KDP-FL is shown in Figure 1. Federated learning contains T rounds of global training. Global training includes task definition, model update, and local training. The RN defines the task and initializes the global model. Then, it loops the following steps to finish the global training:

(1): The RN updates the global model parameters and sends them to the TN.
(2): The TN trains the model locally and obtains the local training result for the tth round of training after e iterations. This step is also called local training.
(3): The TN adds Gaussian noise to the training result and sends them to the RN.
(4): The RN uses Kalman filtering on all the training result to reduce the noise.
(5): The RN aggregates all the training results to obtain the global model parameters for the next round.

3.2. Method Details

In this section, we introduce our method in detail. Some significant symbols used in this paper are defined in Table 1.

3.2.1. Differential Privacy-Based Stochastic Mini-BATCH Gradient Descent (DP-SGD)

TNs train the model locally instead of providing local raw data to the RNs to protect data privacy, then send the training result to the RN. Algorithm 1 describes the local training. Each TN uses

w_{0}

as the initial local model parameters, uses the local data as the dataset, and trains the model using the DP-SGD algorithm. In the eth round of the DP-SGD, the TN performs several iterations to optimize the model parameters. During optimization, the training results are not sent to the RN, and noise is added to the training result at the end of the Eth round of optimization of local training. Local training is performed using the following iteration

w = w - λ \frac{1}{B} \sum_{b = 1}^{B} g_{b}

(1)

After E rounds of local training, the training result should be sent to the RN. To protect the gradient privacy, the results need to be revised using a differential privacy method before sending. The differential privacy mechanism in this paper is the Gaussian mechanism, defined in Equation (2), where

N (0, S_{f}^{2} \cdot σ^{2})

denotes a normal distribution with mean zero and variance

S_{f}^{2} \cdot σ^{2}

.

M (D) = f (D) + N (0, S_{f}^{2} \cdot σ^{2})

(2)

A Gaussian mechanism for a function f with

S_{f}

as sensitivity satisfies

(ε, δ)

-differential privacy if

δ \geq \frac{4}{5} exp (- {(σ ε)}^{2} / 2)

and

ε < 1

are satisfied [14]. Then, the local training result noise addition of the TN is calculated as:

g^{'} = \frac{1}{λ} [w - w_{0} + N (0, σ^{2} M^{2} I)]

(3)

To address the gradient explosion that may occur during the training process,

L 2

norm gradient clipping is used. In addition, gradient clipping can limit the gradient values to a certain range, which is beneficial to the Gaussian mechanism. The clipping threshold is set to M. The gradient value g does not change when the

L 2

norm number of the gradient

{∥g∥}_{2} \leq M

, and if

{∥g∥}_{2} > M

, the gradient is clipped as Algorithm 1 described. The gradient clipping method can be used to control the gradient values from exceeding the set threshold and prevent the impact of gradient explosion on the model. In addition, after gradient clipping, the upper bound of the gradient value

L 2

norm is

{∥g∥}_{2} \leq M

. Limiting the upper bound can effectively control the variance in the noise distribution.

The use of differential privacy in SGD requires the calculation of privacy consumption, and there is a risk of privacy leakage when the privacy cost is higher than the privacy budget. Compared to the strong composition theorem [17], using the Moment Accountant proposed by Adabi et al. [10] can obtain a smaller

σ

with the same privacy budget. According to the theory of Moment Account, for any

ε < c_{1} q^{2} T

and

σ \geq c_{2} \frac{q \sqrt{T log (1 / δ)}}{ε}

, the DP-BGD algorithm is

(ε, δ)

-differentially private for any

δ

> 0 where

c_{1}

,

c_{2}

are constants, and q is the sample probability defined in Equation (4). The variance in the added noise is based on the number of training rounds and privacy budget.

q = \frac{C}{N} \frac{B E}{| D |}

(4)

σ \geq c_{2} \frac{q \sqrt{T log (1 / δ)}}{ε}

(5)

The DP-SGD algorithm is shown in Algorithm 1.

Algorithm 1 DP-SGD

Input:

D_{c}

;

w_{0}

, M;
Output:

g^{'}

1:: $w \leftarrow w_{0}$
2:: for $e = 1 to E$ do
3:: Take a random batch with size B
4:: for $b = 1 to B$ do
5:: $g_{b} \leftarrow \frac{\partial l (w, x_{b})}{\partial w}$
6:: $g_{b} \leftarrow g_{b} / \max (1, \frac{∥g_{b}∥}{M})$
7:: end for
8:: $w \leftarrow w - λ \frac{1}{B} \sum_{b = 1}^{B} g_{b}$
9:: end for
10:: $g^{'} \leftarrow \frac{1}{λ} [w - w_{0} + N (0, σ^{2} M^{2} I)]$

3.2.2. Kalman Noise Reduction Parameter Aggregation

The TN sends the training result to the RN after adding noise using Equation (3), and the model parameters of node TN

_{c}

are noted as

g_{c}

; then, the RN receives the set of model parameters

G = \{{g^{'}}_{1}, {g^{'}}_{2}, \dots, {g^{'}}_{c}\}

.

The model parameters obtained by the RN contain noise, which will decrease the accuracy of the model. In order to reduce the impact of differential privacy on the model and improve the accuracy, it is necessary to revise the model parameters sent by the TN. In this paper, we use Kalman filtering to revise all model parameters obtained by the RN to improve the accuracy of the model under the situation where differential privacy noise has been added to the global model:

g = K a l m a n (g^{'}) g^{'} \in G

(6)

The details will be described in Section 3.2.3. The global model parameters of the tth round are obtained after aggregating all the model parameters revised by the Kalman filter; then, the global model is updated.

w_{t + 1} = w_{t} - \frac{λ}{C} \sum_{c = 1}^{C} g_{t, c}

(7)

The KDP-FL algorithm is shown in Algorithm 2.

Algorithm 2 KDP-FedAvg

Input:

D_{i}

;

w_{0}

;
Output:w

1:: for $t = 1 to T$ do
2:: for $c = 1 to C$ in parallel do
3:: TN $_{c}$ do: ${g^{'}}_{t, c} \leftarrow DP - SGD (D_{i}, w, M)$
4:: RN do: $g_{t, c} \leftarrow Kalman ({g^{'}}_{t, c})$
5:: end for
6:: $w_{t + 1} \leftarrow w_{t} + \frac{λ}{C} \sum_{c \in C}^{} g_{t, c}$
7:: end for

3.2.3. Details of Kalman Noise Reduction

The Kalman filter is able to reduce the noise of normal distribution in the linear system [18]. The noise added to the training result after differential privacy is Gaussian noise satisfying the condition of normal distribution, and the state update equation is linear, so the Kalman filter can be used for noise reduction in the training result to decrease the impact of differential privacy on the accuracy of the model. The noise revising has two steps, the first step is the model parameters’ update, and the second step is the parameters’ update correction.

There are C TNs in the tth round of federated learning; the model parameters of the TN

_{c}

are

w_{t, c}

, the global model parameters are

w_{t}

, the parameter vector composed of the model parameters of all TNs is

W_{t} = [\begin{matrix} w_{t} & w_{t, 1} & w_{t, 2} & \dots & w_{t, N} \end{matrix}]

, and the gradient values of TN

_{c}

are

g_{t, c}

. Then, the gradient vector composed of the gradient values of all TNs is

{G^{'}}_{t} = [\begin{matrix} 0 & {g^{'}}_{t, 1} & {g^{'}}_{t, 2} & \dots & {g^{'}}_{t, c} \end{matrix}]

. The linear coefficient matrix of the state prediction equation U is a unit matrix, and V is defined as following:

V = [\begin{matrix} 1 & 0 & 0 & \dots & 0 \\ 0 & \frac{λ}{C} & 0 & \dots & 0 \\ 0 & 0 & \frac{λ}{C} & \dots & 0 \\ \dots & \dots & \dots & ⋱ & \dots \\ 0 & 0 & 0 & \dots & \frac{λ}{C} \end{matrix}]

(8)

Then, the parameter update equation for model training can be transformed into the state prediction equation:

W_{t + 1}^{T} = U W_{t}^{T} + V {G^{'}}_{t}^{T}

(9)

In fact, the parameter update equation Equation (9) for model training can also be written as:

\{\begin{matrix} w_{t + 1} & = w_{t} - λ \sum_{c \in C} {g^{'}}_{t, c} \\ w_{t + 1, 1} & = w_{t, 1} - λ g_{t, 1} + N (0, σ^{2} M^{2} I) \\ \dots \\ w_{t + 1, c} & = w t, c - λ g_{t, c} + N (0, σ^{2} M^{2} I) \end{matrix},

(10)

The noise added by the TN in differential privacy uses the Gaussian mechanism satisfying Gaussian distribution.

To achieve dimensional matching with U and V, we need the calculation described in Algorithm 3. The elements in

W_{t}^{T}

and

{G^{'}}_{t}^{T}

are all matrix. First, we reshape each element in

W_{t}^{T}

and

{G^{'}}_{t}^{T}

. Let

N = m \times n

, and each matrix

w_{t, c}

or

{g^{'}}_{t, c}

shaped

m \times n

will be reshaped to

N \times 1

. Let

ϖ_{c, i}

be the ith parameter in

w_{t, c}

and

ϱ_{c, i}

be the ith in

{g^{'}}_{t, c}

. Then, we can calculate Equation (9) using Algorithm 3.

Algorithm 3 Matrix broadcast calculation

Input:

G_{t}^{^{'} T}

;

W_{t}^{T}

; U; V;
Output:

W_{t + 1}

1:: for $i = 1$ toNdo
2:: $ω_{i} \leftarrow [\begin{matrix} ϖ_{i} & ϖ_{1, i} & ϖ_{2, i} & \dots & ϖ_{C, i} \end{matrix}]$
3:: $ρ_{i} \leftarrow [\begin{matrix} 0 & ϱ_{1, i} & ϱ_{2, i} & \dots & ϱ_{C, i} \end{matrix}]$
4:: $ω_{i}^{T} \leftarrow U ω_{i}^{T} + V ρ_{i}^{T}$
5:: end for
6:: $w_{t + 1} \leftarrow [\begin{matrix} ϖ_{1} & ϖ_{2} & \dots & ϖ_{N} \end{matrix}]$
7:: for $c = 1 to C$ do
8:: $w_{t + 1, c} \leftarrow [\begin{matrix} ϖ_{c, 1} & ϖ_{c, 2} & \dots & ϖ_{c, N} \end{matrix}]$
9:: end for
10:: $W_{t + 1} \leftarrow [\begin{matrix} w_{t + 1} & w_{t + 1, 1} & w_{t + 1, 2} & \dots & w_{t + 1, C} \end{matrix}]$

The gradient vector

G_{k}^{^{'} T}

contains noise added by the TN, and the distribution of the noise is known. The Gaussian noise added by the differential privacy is considered an external disturbance, i.e., uncertainty in the system.

In Kalman filtering,

z_{t}

is used to denote the actually obtained data, calculated using the following equation:

z_{t} = H W_{t} + R,

(11)

where H is the data unit transformation matrix. In this paper, the data sent by the TN are the same as the data obtained by the RN; so, H is the

C + 1

dimensional unit vector. R is the error column vector. In order to avoid the occurrence of a singular matrix in subsequent calculations, here,

R = 0.1 Q

is taken. The calculation of the Kalman gain matrix using the noise covariance matrix

P_{t}

is

K_{t + 1} = H P_{t} H^{T} {(H P_{t} H^{T} + R)}^{- 1}

(12)

Since H is a

C + 1

dimensional unit vector, the formula for the Kalman gain can be reduced to

K_{t + 1} = P_{t} {(P_{t} + R)}^{- 1} .

(13)

The parameter update equation using the Kalman gain correction is

W_{t + 1} = W_{t} + K_{t} (z_{t} - H W_{t}) .

(14)

We update the covariance matrix of the norm noise:

P_{t + 1} = (I - K_{t} H) P_{t} + Q .

(15)

Q \sim N (0, σ^{2} M^{2} I)

(16)

4. Experiment and Analysis

4.1. Experiment Setting

The public datasets MNIST [19], FMNIST [20], and CIFAR10 [21] were selected to train and test. The dataset was partitioned to simulate a federated learning environment, (1) independently and identically distributed (IID) for each TN [1]: the training data were shuffled and then partitioned into 100 clients each receiving an equal amount of examples; (2) non-independently and identically distributed (Non-IID) for each TN [22]: we distributed the data among 100 TNs, such that each TN contained samples of only two kinds; the number of samples among the TN followed a power law, as detailed in Table 2.

Using CNN as the training model for MNIST and FMNIST, the network architecture consisted of two fully connected layers. The convolutional layers used

5 \times 5

convolutions with stride 1, followed by a ReLU and

2 \times 2

max pools, with 32 channels in the first layer and 64 channels in the second layer. Then the first convolution output was

32 \times 14 \times 14

after pools, and the second convolution output was

64 \times 7 \times 7

after pools. Last, we had two fully connected layers and used resnet50 [23] to train the model for CIFAR10.

The experimental environment was an Intel(R) Xeon(R) W-2295 CPU@3.00 GHz, 64 GB RAM, NVIDIA GeForce RTX 3080 GPU, and the experimental platform was Pytorch. According to the related works [10,11,12], the parameters of this paper were set as shown in Table 3.

4.2. Privacy Budget Analysis

Moments Accountant can provide a tighter bound than the composition theorem on privacy loss. The privacy loss can be calculated by the noise level

σ

. The privacy loss with the increasing global epoch is shown in Figure 2. To test the effectiveness of the proposed method under different privacy budgets, we selected

ε = 1

and

ε = 5

. A larger privacy budget

ε = 5

will result in a more accurate model, and a smaller privacy budget

ε = 1

will achieve better privacy protection. Then, we let

δ = 10^{- 5}

and computed the noise level

σ

using Equation (5).

4.3. Experimental Results

The effectiveness of the method is shown by the model accuracy, and the comparison results are shown in Figure 3. The DP-FedAvg algorithm protected privacy by adding noise to the gradient matrix, and the impact of privacy protection depended on the size of the privacy budget. The smaller the privacy budget, the better the privacy protection. However, as the privacy budget decreased, the noise required to be added increased, which may lead to a decrease in the accuracy of the model. By comparing the accuracy under the non-IID dataset, DP-FedAvg in (a) and (c), a higher requirement for privacy protection needed a smaller

ε

, which led to lower accuracy of the model. Furthermore, the accuracy of DP-FedAvg at

ε = 1

decreased about 3% compared to that at

ε = 5

. Furthermore, the accuracy of DP-FedAvg

ε = 1

was lower than

ε = 5

and was the same as the pattern in the IID.

In addition, using different datasets, FMNIST and CIFAR10, the experimental results in Table 4 show that KDPFL achieved good results in the other two datasets. Hence, the model is suitable for other datasets. By testing CNN and CIFAR10, the experimental results show that KDPFL is suitable for different neural network structures and has good compatibility.

In this paper, the parameter update equation was transformed into the state update equation. Furthermore, the global model parameter update, which was completely determined by the TN’s training result, was transformed into the global model parameter update, which was jointly determined by the estimated value and the TN calculation result. The Kalman gain coefficient was calculated by the noise covariance matrix, and the weight of the estimated value and the predicted value in the global model update were calculated by the Kalman gain coefficient to realize the optimal global model update.

The Kalman filter noise reduction method, which reduced the noise at aggregation, decreased the effect of the differential privacy added noise on the model, achieved an accuracy closer to the FedAvg algorithm without added noise, and performed well with both non-IID data and IID data. With non-IID data, the accuracy

ε = 1

was improved up to 4.5% compared to DP-FedAvg, and the accuracy

ε = 5

was improved up to 2.5%. The above experiments show that the scheme proposed in this paper can protect the user’s privacy with little loss of accuracy.

5. Conclusions

In this paper, a Kalman filter-based differential privacy federated learning method, KDP-FL, was proposed. Federated learning was used to achieve data sharing while preserving privacy. In addition, differential privacy was added to federated learning, which can resist gradient differential attacks and further protect user privacy. Furthermore, the influence of the noise added by the differential mechanism on the accuracy of the model was reduced by the Kalman filter. The experiment results and analysis showed that the proposed method was able to train models with little loss of accuracy while protecting user privacy. KDP-FL is suitable for training privacy-preserving models under multiple nodes and can help in data cooperation among multiple nodes.

When the neural network is simple, the additional computational overhead brought by Algorithm 3 is negligible. However, when the neural network is complex with a large number of parameters, optimizing all parameters requires calculating the Kalman gain for each parameter, which will bring about a non-negligible additional computational overhead.

In addition, the proposed method used Kalman filtering, which increases the computational overhead and is not applicable to devices and nodes with weak computational power, which is a direction for future improvement.

Author Contributions

Conceptualization, X.Y. and Z.D.; data collection and analysis, Z.D.; validation, X.Y.; writing—original draft preparation, Z.D.; writing—review and editing, X.Y. and Z.D. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Ministry of Science and Technology of China, National Key R&D Program “Cyberspace Security” Key Project, 2017YFB0802305 and the Natural Science Foundation of Hebei Province, F2021201052.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. The source code can be found here: https://git.acwing.com/dzj/kdpfl.git (accessed on 29 July 2022), and the datasets can be found here: https://pan.baidu.com/s/1PVFJ7vnezGe9jEp8Pgh_fg?pwd=n2w5 (accessed on 29 July 2022).

Conflicts of Interest

The authors declare no conflict of interest.

References

McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; Arcas, B.A. Communication-Efficient Learning of Deep Networks from Decentralized Data. Artif. Intell. Stat. 2017, 54, 1273–1282. [Google Scholar]
Konečný, J.; McMahan, H.B.; Ramage, D.; Richtárik, P. Federated Optimization: Distributed Machine Learning for On-Device Intelligence. arXiv 2016, arXiv:1610.02527. [Google Scholar]
Konečný, J.; McMahan, H.B.; Yu, F.X.; Richtárik, P.; Suresh, A.T.; Bacon, D. Federated Learning: Strategies for Improving Communication Efficiency. arXiv 2016, arXiv:1610.05492. [Google Scholar]
Melis, L.; Song, C.; Cristofaro, E.D.; Shmatikov, V. Exploiting Unintended Feature Leakage in Collaborative Learning. In Proceedings of the IEEE Symposium on Security and Privacy, San Francisco, CA, USA, 20–22 May 2019; pp. 691–706. [Google Scholar]
Zhao, B.; Mopuri, K.R.; Bilen, H. iDLG: Improved Deep Leakage from Gradients. arXiv 2020, arXiv:2001.02610. [Google Scholar]
Ou, W.; Zeng, J.; Guo, Z.; Yan, W.; Liu, D.; Fuentes, S. A homomorphic-encryption-based vertical federated learning scheme for rick management. Comput. Sci. Inf. Syst. 2020, 17, 819–834. [Google Scholar] [CrossRef]
Zhang, C.; Li, S.; Xia, J.; Wang, W.; Yan, F.; Liu, Y. BatchCrypt: Efficient Homomorphic Encryption for Cross-Silo Federated Learning. In Proceedings of the USENIX Annual Technical Conference, Boston, MA, USA, 15–17 July 2020; pp. 493–506. [Google Scholar]
Fang, H.; Qian, Q. Privacy Preserving Machine Learning with Homomorphic Encryption and Federated Learning. Future Internet 2021, 13, 94. [Google Scholar] [CrossRef]
Ma, J.; Naas, S.; Sigg, S.; Lyu, X. Privacy-preserving Federated Learning based on Multi-key Homomorphic Encryption. arXiv 2021, arXiv:2104.06824. [Google Scholar] [CrossRef]
Abadi, M.; Chu, A.; Goodfellow, I.J.; McMahan, H.B.; Mironov, I.; Talwar, K.; Zhang, L. Deep Learning with Differential Privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 24–28 October 2016; pp. 308–318. [Google Scholar]
Hu, R.; Guo, Y.; Li, H.; Pei, Q.; Gong, Y. Personalized Federated Learning With Differential Privacy. IEEE Internet Things J. 2020, 7, 9530–9539. [Google Scholar] [CrossRef]
Zhao, Y.; Zhao, J.; Yang, M.; Wang, T.; Wang, N.; Lyu, L.; Niyato, D.; Lam, K. Local Differential Privacy-Based Federated Learning for Internet of Things. IEEE Internet Things J. 2021, 8, 8836–8853. [Google Scholar] [CrossRef]
Li, Y.; Zhou, Y.; Jolfaei, A.; Yu, D.; Xu, G.; Zheng, X. Privacy-Preserving Federated Learning Framework Based on Chained Secure Multiparty Computing. IEEE Internet Things J. 2021, 8, 6178–6186. [Google Scholar] [CrossRef]
Liu, J.; Xu, H.; Xu, Y.; Ma, Z.; Wang, Z.; Qian, C.; Huang, H. Communication-efficient asynchronous federated learning in resource-constrained edge computing. Comput. Netw. 2021, 199, 108429. [Google Scholar] [CrossRef]
Abubaker, Z.; Javaid, N.; Almogren, A.; Akbar, M.; Zuair, M.; Ben-Othman, J. Blockchained service provisioning and malicious node detection via federated learning in scalable Internet of Sensor Things networks. Comput. Netw. 2022, 204, 108691. [Google Scholar] [CrossRef]
Mothukuri, V.; Khare, P.; Parizi, R.M.; Pouriyeh, S.; Dehghantanha, A.; Srivastava, G. Federated-Learning-Based Anomaly Detection for IoT Security Attacks. IEEE Internet Things J. 2022, 9, 2545–2554. [Google Scholar] [CrossRef]
Dwork, C.; Rothblum, G.N.; Vadhan, S.P. Boosting and Differential Privacy. In Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, Las Vegas, NV, USA, 23–26 October 2010; pp. 51–60. [Google Scholar]
Alazard, D. Introduction to Kalman Filtering. Meas. Control-Lond. Inst. Meas. Control 1986, 19, 84–110. [Google Scholar]
Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version]
Xiao, H.; Rasul, K.; Vollgraf, R. Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv 2017, arXiv:1708.07747. [Google Scholar]
Ranzato, M.; Krizhevsky, A.; Hinton, G.E. Factored 3-Way Restricted Boltzmann Machines For Modeling Natural Images. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2010, Chia Laguna Resort, Sardinia, Italy, 13–15 May 2010; Volume 9, pp. 621–628. [Google Scholar]
Li, X.; Huang, K.; Yang, W.; Wang, S.; Zhang, Z. On the Convergence of FedAvg on Non-IID Data. arXiv 2019, arXiv:1907.02189. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]

Figure 1. The workflow of KDPFL.

Figure 2. Privacy loss with increasing global epoch.

Figure 3. Experiment results: (a) non-IID,

ε = 1

; (b) IID,

ε = 1

; (c) non-IID,

ε = 5

; (d) IID,

ε = 5

.

Figure 3. Experiment results: (a) non-IID,

ε = 1

; (b) IID,

ε = 1

; (c) non-IID,

ε = 5

; (d) IID,

ε = 5

.

Table 1. Symbol definitions.

Symbol	Definition
$w_{0}$	Initial global model parameters
T	Total number of global training rounds
E	The number of iterations performed in local training between two global training rounds
$λ$	Learning rate
B	Batch size
g	Local training result of TN
$g^{'}$	Local training result of TN after adding noise
w	Model parameters of TN in local training
$w_{t}$	Global model parameters in the tth round of global training.
$w_{t, c}$	Model parameters of TN $_{c}$ in the tth round of global training
$g_{t, c}$	Local training result of TN $_{c}$ in the tth round of global training
${g^{'}}_{t, c}$	Local training result of TN $_{c}$ in the tth round of global training after adding noise
$M$	Differential privacy mechanism
D	Dataset
M	Clipping threshold
q	Sample probability
I	Unit matrix
C	The number of TN

Table 2. Dataset details.

Dataset	Labels	Devices	Training Samples	Samples/Device
Dataset	Labels	Devices	Training Samples	mean	std
MNIST	10	100	60,000	600	328
FMNIST	10	100	60,000	600	328
Cifar10	10	100	50,000	500	293

Table 3. Parameter definitions.

Definition	Parameter
Total number of nodes	$C_{a l l} = 100$
Number of TNs involved in training	$C = 20$
Number of local batches	$B = 32$
Learning rate	$λ = 0.05$
Local training round	$E = 5$
Privacy budget	$ε = 1$ or $ε = 5$

Table 4. Accuracy in different models and datasets.

Method	FMNIST (acc/%)				CIFAR10 (acc/%)
	Non IID		IID		Non IID		IID
	$ε = 1$	$ε = 5$	$ε = 1$	$ε = 5$	$ε = 1$	$ε = 5$	$ε = 1$	$ε = 5$
FedAvg	86.62		89.56		83.21		86.34
DP-FedAvg	83.17	84.94	86.59	88.02	77.26	79.44	81.23	83.55
KDPFL	86.12	86.23	88.93	89.01	82.76	82.88	85.93	86.03

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, X.; Dong, Z. Kalman Filter-Based Differential Privacy Federated Learning Method. Appl. Sci. 2022, 12, 7787. https://doi.org/10.3390/app12157787

AMA Style

Yang X, Dong Z. Kalman Filter-Based Differential Privacy Federated Learning Method. Applied Sciences. 2022; 12(15):7787. https://doi.org/10.3390/app12157787

Chicago/Turabian Style

Yang, Xiaohui, and Zijian Dong. 2022. "Kalman Filter-Based Differential Privacy Federated Learning Method" Applied Sciences 12, no. 15: 7787. https://doi.org/10.3390/app12157787

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Kalman Filter-Based Differential Privacy Federated Learning Method

Abstract

1. Introduction

2. Related Work

3. Proposed Method

3.1. Method Overview

3.2. Method Details

3.2.1. Differential Privacy-Based Stochastic Mini-BATCH Gradient Descent (DP-SGD)

3.2.2. Kalman Noise Reduction Parameter Aggregation

3.2.3. Details of Kalman Noise Reduction

4. Experiment and Analysis

4.1. Experiment Setting

4.2. Privacy Budget Analysis

4.3. Experimental Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI