# Towards Efficient Federated Learning: Layer-Wise Pruning-Quantization Scheme and Coding Design

^{1}

^{2}

^{3}

^{4}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

#### 1.1. Background and Motivation

#### 1.2. Related Work

#### 1.3. Contribution and Organization

- •
- We propose FedLP-Q (the codes in this work are available at https://github.com/Zhuzzq/FedLP/tree/main/FedLP-Q), a joint pruning-quantization FL framework employing the layer-wise approach. FedLP-Q provides a simple but efficient paradigm to mitigate the communication loads and computation complexity in FL systems;
- •
- We develop two distinct FedLP-Q schemes for homogeneous and heterogeneous cases. The corresponding pruning strategies, quantization rules, and the coding designs are presented in detail. The theoretical analysis also shows the strengths of the coding scheme for FedLP-Q;
- •
- We carry out several experiments to evaluate the performance of FedLP-Q. The outcomes suggest that this layer-wise pruning-quantization mechanism significantly improves the system efficiency with controllable performance loss.

## 2. Preliminaries and FedLP-Q Framework

#### 2.1. Basic FL

#### 2.2. An Overview of the FedLP-Q Framework

## 3. FedLP-Q: Quantization-Coding Scheme for Layer-Wise Pruning

#### 3.1. Pruning Phase

#### 3.1.1. Homogeneity Case

#### 3.1.2. Heterogeneity Case

#### 3.2. Quantization and Coding Phase

#### 3.2.1. Quantization Scheme

#### 3.2.2. Coding Design

Algorithm 1 Encoder for layer l | |

Input:$\left\{\parallel {\mathit{g}}_{k}^{l}{\parallel}_{2},\phantom{\rule{4pt}{0ex}}b,\phantom{\rule{4pt}{0ex}}\mathrm{seg}({\mathit{g}}_{k}^{l};b),\phantom{\rule{4pt}{0ex}}\mathrm{sgn}\left({\mathit{g}}_{k}^{l}\right)\right\}$, ${M}_{l}$;
| |

1: ${\mathcal{C}}_{l}\leftarrow \mathtt{Concat}\left(\mathtt{Float}\mathtt{32}\left(\parallel {\mathit{g}}_{k}^{l}{\parallel}_{2}\right),\mathtt{Elias}\_\mathtt{Encoder}\left(b\right)\right)$; | ▹encode the header |

2: for
$i\leftarrow 1\phantom{\rule{4pt}{0ex}}\mathbf{to}\phantom{\rule{4pt}{0ex}}{M}_{l}$
do | |

3: ${\mathcal{C}}_{l}\leftarrow \mathtt{Concat}\left({\mathcal{C}}_{l},\mathtt{Elias}\_\mathtt{Encoder}\left(\mathrm{seg}({\mathit{g}}_{k}^{(l,i)};b)\right)\right)$; | ▹ encode the interval index |

4: ${\mathcal{C}}_{l}\leftarrow \mathtt{Concat}\left({\mathcal{C}}_{l},\mathtt{Binary}\left(\mathrm{sgn}({\mathit{g}}_{k}^{(l,i)};b)\right)\right)$; | ▹ encode the gradient sign |

5: end for | |

Output: encoded message ${\mathcal{C}}_{l}$. |

Algorithm 2 Decoder for layer l | |

Input: received message ${\mathcal{C}}_{l}$; Initialization: ${\mathrm{seg}}^{l}\leftarrow \left[\phantom{\rule{4pt}{0ex}}\right]$, ${\mathrm{sgn}}^{l}\leftarrow \left[\phantom{\rule{4pt}{0ex}}\right]$, $i\leftarrow 0$; | |

1: ${G}_{l}\leftarrow \mathtt{Float}\mathtt{32}\_\mathtt{Decoder}\left({\mathcal{C}}_{l}\right)$; | ▹decode the gradient norm |

2: ${\mathcal{C}}_{l}>>32$; | |

3: $(b,{l}_{E})\leftarrow \mathtt{Elias}\_\mathtt{Decoder}\left({\mathcal{C}}_{l}\right)$; | ▹decode the quantization bit |

4: ${\mathcal{C}}_{l}>>{l}_{E}$; | |

5: while
${\mathcal{C}}_{l}$
do | |

6: $({\mathrm{seg}}_{i}^{l},{l}_{E})\leftarrow \mathtt{Elias}\_\mathtt{Decoder}\left({\mathcal{C}}_{l}\right)$; | ▹ decode the interval index |

7: ${\mathcal{C}}_{l}>>{l}_{E}$; | |

8: $({\mathrm{sgn}}_{i}^{l},{l}_{E})\leftarrow {\mathcal{C}}_{l}$; | ▹ decode the gradient sign |

9: $i\leftarrow i+1$; ${\mathcal{C}}_{l}>>1$; | |

10: end while | |

Output: decoded elements $\left\{{G}_{l},\phantom{\rule{4pt}{0ex}}b,\phantom{\rule{4pt}{0ex}}{\mathrm{seg}}^{l},\phantom{\rule{4pt}{0ex}}{\mathrm{sgn}}^{l}\right\}$. |

#### 3.2.3. Algorithm Formulation

Algorithm 3 FedLP-Q | |

Initialization: local models, pruning configures, etc. | |

1: for $t\leftarrow 1\phantom{\rule{4pt}{0ex}}\mathbf{to}\phantom{\rule{4pt}{0ex}}\mathrm{max}\_\mathrm{epoch}$ do | |

2: for participator client k in parallel do | ▹ client side |

3: Update ${\mathit{\theta}}_{k,t}\leftarrow \mathtt{Local}\_\mathtt{Train}({\mathit{\theta}}_{k,t-1};{\mathcal{D}}_{k})$; | |

4: Calculate the accumulated gradients ${\mathit{g}}_{k,t}$; | |

5: Conduct layer-wise pruning: ${\tilde{\mathit{g}}}_{k,t}$; | |

6: Quantize the local gradients: $Q\left({\tilde{\mathit{g}}}_{k,t}\right)$; | |

7: Carry out Algorithm 1 to encode the quantized gradients; | |

8: Upload local message to parameter server; | |

9: end for | |

10: Carry out Algorithm 2 to decode the received messages; | ▹ server side |

11: Aggregate each layer ${\overline{\mathit{\theta}}}_{t}^{l}$ by (3); | |

12: Download the global model: ${\mathit{\theta}}_{k,t}\leftarrow {\overline{\mathit{\theta}}}_{t}$; | ▹ client side |

13: end for | |

Output: global model: ${\overline{\mathit{\theta}}}_{t}$. |

#### 3.3. Theoretical Analysis

#### 3.3.1. Impacts on Convergence

**Proposition**

**1.**

**Proof.**

**Remark**

**1.**

#### 3.3.2. Coding Performance

**Proposition**

**2**

**.**By employing layer-wise pruning-quantization, the average bit number to communicate ${\mathit{g}}_{k}^{l}$ for client k is upper-bounded by:

**Proof.**

**Remark**

**2.**

**Definition**

**1.**

## 4. Experimental Results

## 5. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Data Availability Statement

## Conflicts of Interest

## Abbreviations

FL | Federated Learning |

iid | Independent and Identically Distributed |

non-iid | Non-Independent and Non-Identically Distributed |

FedAvg | Federated Averaging |

QSGD | Quantized Stochastic Gradient Descent |

FedLP | Federated Learning with Layer-wise Pruning |

FedLP-Q | Federated Learning with Layer-wise Pruning and Quantization |

Conv | Convolutional |

FC | Fully Connected |

FLOPs | Floating Point Operations |

## References

- McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the Artificial Intelligence and Statistics, PMLR, Fort Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. [Google Scholar]
- Letaief, K.B.; Chen, W.; Shi, Y.; Zhang, J.; Zhang, Y.J.A. The roadmap to 6G: AI empowered wireless networks. IEEE Commun. Mag.
**2019**, 57, 84–90. [Google Scholar] [CrossRef] [Green Version] - Letaief, K.B.; Shi, Y.; Lu, J.; Lu, J. Edge artificial intelligence for 6G: Vision, enabling technologies, and applications. IEEE J. Sel. Areas Commun.
**2021**, 40, 5–36. [Google Scholar] [CrossRef] - Yang, Z.; Chen, M.; Wong, K.K.; Poor, H.V.; Cui, S. Federated learning for 6G: Applications, challenges, and opportunities. Engineering
**2022**, 8, 33–41. [Google Scholar] [CrossRef] - Wan, S.; Lu, J.; Fan, P.; Shao, Y.; Peng, C.; Chuai, J. How global observation works in Federated Learning: Integrating vertical training into Horizontal Federated Learning. IEEE Internet Things J.
**2023**, 10, 9482–9497. [Google Scholar] [CrossRef] - Zhu, Z.; Wan, S.; Fan, P.; Letaief, K.B. Federated multiagent actor–critic learning for age sensitive mobile-edge computing. IEEE Internet Things J.
**2021**, 9, 1053–1067. [Google Scholar] [CrossRef] - Yang, Q.; Liu, Y.; Chen, T.; Tong, Y. Federated machine learning: Concept and applications. ACM Trans. Intell. Syst. Technol. (TIST)
**2019**, 10, 1–19. [Google Scholar] [CrossRef] - Zhu, Z.; Fan, P.; Peng, C.; Letaief, K.B. ISFL: Federated Learning for Non-i.i.d. Data with Local Importance Sampling. arXiv
**2022**, arXiv:2210.02119v2. [Google Scholar] - Chen, M.; Mao, B.; Ma, T. Efficient and robust asynchronous federated learning with stragglers. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Li, T.; Sahu, A.K.; Talwalkar, A.; Smith, V. Federated learning: Challenges, methods, and future directions. IEEE Signal Process. Mag.
**2020**, 37, 50–60. [Google Scholar] - Kairouz, P.; McMahan, H.B.; Avent, B.; Bellet, A.; Bennis, M.; Bhagoji, A.N.; Bonawitz, K.; Charles, Z.; Cormode, G.; Cummings, R.; et al. Advances and open problems in federated learning. Found. Trends Mach. Learn.
**2021**, 14, 1–210. [Google Scholar] [CrossRef] - Han, S.; Mao, H.; Dally, W.J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv
**2015**, arXiv:1510.00149. [Google Scholar] - Caldas, S.; Konečny, J.; McMahan, H.B.; Talwalkar, A. Expanding the reach of federated learning by reducing client resource requirements. arXiv
**2018**, arXiv:1812.07210. [Google Scholar] - Liu, S.; Yu, G.; Yin, R.; Yuan, J. Adaptive network pruning for wireless federated learning. IEEE Wirel. Commun. Lett.
**2021**, 10, 1572–1576. [Google Scholar] [CrossRef] - Wen, D.; Jeon, K.J.; Huang, K. Federated Dropout—A Simple Approach for Enabling Federated Learning on Resource Constrained Devices. IEEE Wirel. Commun. Lett.
**2022**, 11, 923–927. [Google Scholar] [CrossRef] - Cheng, G.; Charles, Z.; Garrett, Z.; Rush, K. Does Federated Dropout Actually Work? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 3387–3395. [Google Scholar]
- Horvath, S.; Laskaridis, S.; Almeida, M.; Leontiadis, I.; Venieris, S.; Lane, N. Fjord: Fair and accurate federated learning under heterogeneous targets with ordered dropout. Adv. Neural Inf. Process. Syst.
**2021**, 34, 12876–12889. [Google Scholar] - Chen, Y.; Chen, Z.; Wu, P.; Yu, H. FedOBD: Opportunistic Block Dropout for Efficiently Training Large-scale Neural Networks through Federated Learning. arXiv
**2022**, arXiv:2208.05174. [Google Scholar] - Zhu, Z.; Shi, Y.; Luo, J.; Wang, F.; Peng, C.; Fan, P.; Letaief, K.B. FedLP: Layer-wise Pruning Mechanism for Communication-Computation Efficient Federated Learning. arXiv
**2023**, arXiv:2303.06360. [Google Scholar] - Alistarh, D.; Grubic, D.; Li, J.; Tomioka, R.; Vojnovic, M. QSGD: Communication-efficient SGD via gradient quantization and encoding. Adv. Neural Inf. Process. Syst.
**2017**, 30, 1707–1718. [Google Scholar] - Hönig, R.; Zhao, Y.; Mullins, R. DAdaQuant: Doubly-adaptive quantization for communication-efficient Federated Learning. In Proceedings of the International Conference on Machine Learning, PMLR, Baltimore, MD, USA, 17–23 July 2022; pp. 8852–8866. [Google Scholar]
- Liu, L.; Zhang, J.; Song, S.; Letaief, K.B. Hierarchical federated learning with quantization: Convergence analysis and system design. IEEE Trans. Wirel. Commun.
**2022**, 22, 2–18. [Google Scholar] [CrossRef] - Amiri, M.M.; Gunduz, D.; Kulkarni, S.R.; Poor, H.V. Federated learning with quantized global model updates. arXiv
**2020**, arXiv:2006.10672. [Google Scholar] - Shlezinger, N.; Chen, M.; Eldar, Y.C.; Poor, H.V.; Cui, S. UVeQFed: Universal vector quantization for federated learning. IEEE Trans. Signal Process.
**2020**, 69, 500–514. [Google Scholar] [CrossRef] - Reisizadeh, A.; Mokhtari, A.; Hassani, H.; Jadbabaie, A.; Pedarsani, R. Fedpaq: A communication-efficient federated learning method with periodic averaging and quantization. In Proceedings of the International Conference on Artificial Intelligence and Statistics, PMLR, Online, 26–28 August 2020; pp. 2021–2031. [Google Scholar]
- Yue, K.; Jin, R.; Wong, C.W.; Dai, H. Communication-efficient federated learning via predictive coding. IEEE J. Sel. Top. Signal Process.
**2022**, 16, 369–380. [Google Scholar] [CrossRef] - Prakash, P.; Ding, J.; Chen, R.; Qin, X.; Shu, M.; Cui, Q.; Guo, Y.; Pan, M. IoT Device Friendly and Communication-Efficient Federated Learning via Joint Model Pruning and Quantization. IEEE Internet Things J.
**2022**, 9, 13638–13650. [Google Scholar] [CrossRef] - Elias, P. Universal codeword sets and representations of the integers. IEEE Trans. Inf. Theory
**1975**, 21, 194–203. [Google Scholar] [CrossRef]

**Figure 1.**Two typical FedLP-Q schemes for different local model settings. The inactive layers (in gray) are removed from the sub-models for uploading. (

**a**) Homogeneity scheme. (

**b**) Heterogeneity scheme.

**Figure 3.**Test on CIFAR-10. FedLP-Q_Homo(0.5) and FedLP-Q_Hetero(u). (

**a**) Under iid data. (

**b**) Under non-iid data.

**Figure 5.**The comparison of coding schemes of model uploading (b = 10). (

**a**) Encoded bits vs. parameters. (

**b**) Code length of each layer.

**Figure 6.**The quantized gradient distribution of selected clients. (

**a**) Under iid data, 2nd global epoch, client 7. (

**b**) Under non-iid data, 71st global epoch, client 81.

**Figure 7.**The entropy rate $\widehat{\mathcal{H}}\left(Q\left({\mathit{g}}_{k}^{l}\right);b\right)$ and the equivalent per-gradient code length ${p}_{k}^{l}\frac{\widehat{\mathcal{C}}\left({\mathit{g}}_{k}^{l};b\right)}{{M}_{l}}$ of each layer.

Notations | Description |
---|---|

N | The number of distributed clients. |

${E}_{l}$ | Local training epoch. |

$\left\{{\mathcal{D}}_{1},\cdots ,{\mathcal{D}}_{N}\right\}$ | Distributed local datasets. |

${P}_{t}$ | The set of participation clients at t-th global epoch. |

K | The number of participation clients. |

$\overline{\mathit{\theta}}$ | The global model. |

$\left\{{\mathit{\theta}}_{1},\cdots ,{\mathit{\theta}}_{N}\right\}$ | Local models. |

$\left\{{\omega}_{1},\cdots ,{\omega}_{N}\right\}$ | Aggregation weights for each client. |

${\mathit{g}}_{k,t}$ | The accumulated gradient of client k in t-th global epoch. |

Schemes | Test Accuracy % (iid/Non-iid) | #Param ↘ | MFLOPs ↘ | ||
---|---|---|---|---|---|

No Quant | b = 10 | b = 8 | |||

FedAvg | 78.8/75.3 | 76.2/74.9 | 71.7/68.5 | / | / |

FedLP-Q(0.2) | 73.4/68.2 | 71.5/66.8 | 56.6/50.4 | 40.0% | / |

FedLP-Q(0.5) | 75.5/73.5 | 75.4/72.5 | 66.7/62.0 | 25.0% | / |

FedLP-Q(0.8) | 78.9/77.3 | 78.3/76.2 | 71.3/66.9 | 10.0% | / |

FedLP-Q(1) | 64.4/60.3 | 64.1/59.2 | 53.6/49.4 | 84.6% | 51.2% |

FedLP-Q(3) | 66.8/62.0 | 66.1/61.7 | 58.4/53.9 | 79.6% | 32.3% |

FedLP-Q(u) | 70.1/66.5 | 68.3/65.4 | 60.7/55.7 | 71.1% | 34.5% |

FedLP-Q(5) | 74.6/73.1 | 74.4/72.1 | 68.6/64.5 | 35.6% | 17.2% |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Zhu, Z.; Shi, Y.; Xin, G.; Peng, C.; Fan, P.; Letaief, K.B.
Towards Efficient Federated Learning: Layer-Wise Pruning-Quantization Scheme and Coding Design. *Entropy* **2023**, *25*, 1205.
https://doi.org/10.3390/e25081205

**AMA Style**

Zhu Z, Shi Y, Xin G, Peng C, Fan P, Letaief KB.
Towards Efficient Federated Learning: Layer-Wise Pruning-Quantization Scheme and Coding Design. *Entropy*. 2023; 25(8):1205.
https://doi.org/10.3390/e25081205

**Chicago/Turabian Style**

Zhu, Zheqi, Yuchen Shi, Gangtao Xin, Chenghui Peng, Pingyi Fan, and Khaled B. Letaief.
2023. "Towards Efficient Federated Learning: Layer-Wise Pruning-Quantization Scheme and Coding Design" *Entropy* 25, no. 8: 1205.
https://doi.org/10.3390/e25081205