# Using Pruning-Based YOLOv3 Deep Learning Algorithm for Accurate Detection of Sheep Face

## Abstract

## Simple Summary

## Abstract

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Research Objects

#### 2.2. Experimental Setup

#### 2.3. Data Collection

#### 2.4. Dataset Creation and Preprocessing

## 3. Sheep Face Recognition Based on YOLOv3-P

#### 3.1. Overview of the Network Framework

#### 3.2. Sheep Face Detection Based on YOLOv3

#### 3.3. Compress the Model by Pruning

#### 3.3.1. Channel Pruning

_{i}, and the dimension of the input feature map of this convolutional layer is (${\mathrm{h}}_{\mathrm{i}},{\mathrm{w}}_{\mathrm{i}},{\mathrm{n}}_{\mathrm{i}}$), where ${\mathrm{n}}_{\mathrm{i}}$ is the number of input channels, and ${\mathrm{h}}_{\mathrm{i}},{\mathrm{w}}_{\mathrm{i}}$ denote the height and width of the input feature map respectively. The convolution layer transforms the input feature map ${\mathrm{x}}_{\mathrm{i}}\in {\mathrm{R}}^{{\mathrm{n}}_{\mathrm{i}}+{\mathrm{h}}_{\mathrm{i}}+{\mathrm{w}}_{\mathrm{i}}}$ into the output feature map ${\mathrm{x}}_{\mathrm{i}+1}\in {\mathrm{R}}^{{\mathrm{n}}_{\mathrm{i}+1}\times {\mathrm{h}}_{\mathrm{i}+1}\times {\mathrm{w}}_{\mathrm{i}+1}}$ by applying ${\mathrm{n}}_{\mathrm{i}+1}$ 3D filters ${\mathrm{F}}_{\mathrm{i},\mathrm{j}}\in {\mathrm{R}}^{{\mathrm{n}}_{\mathrm{i}}\times \mathrm{k}\times \mathrm{k}}$ on ${\mathrm{n}}_{\mathrm{i}}$ channels, and is used as the input feature map for the next convolutional layer. Each of these filters consists of ${\mathrm{n}}_{\mathrm{i}}$ 2D convolution kernels $\mathrm{K}\in {\mathrm{R}}^{\mathrm{k}\times \mathrm{k}}$, and all filters together form the matrix ${\mathrm{F}}_{\mathrm{i}}\in {\mathrm{R}}^{{\mathrm{n}}_{\mathrm{i}}\times {\mathrm{n}}_{\mathrm{i}+1}\times \mathrm{k}\times \mathrm{k}}$. As shown in Figure 6, suppose the current convolutional layer is computed as ${\mathrm{n}}_{\mathrm{i}+1}{\mathrm{n}}_{\mathrm{i}}{\mathrm{k}}^{2}{\mathrm{h}}_{\mathrm{i}+1}{\mathrm{w}}_{\mathrm{i}+1}$. When the filter ${\mathrm{F}}_{\mathrm{i},\mathrm{j}}$ is pruned, its corresponding feature map ${\mathrm{x}}_{\mathrm{i}+1,\mathrm{j}}$ will be deleted. This reduces ${\mathrm{n}}_{\mathrm{i}}{\mathrm{k}}^{2}{\mathrm{h}}_{\mathrm{i}+1}{\mathrm{w}}_{\mathrm{i}+1}$ times of computation. At the same time, the filter corresponding to ${\mathrm{x}}_{\mathrm{i}+1,\mathrm{j}}$ in the next layer will also be pruned, which again additionally reduces the computation of ${\mathrm{n}}_{\mathrm{i}+1}{\mathrm{k}}^{2}{\mathrm{h}}_{\mathrm{i}+2}{\mathrm{w}}_{\mathrm{i}+2}$. That is, the m filters in layer i are pruned, and the computational cost of layers i and j can be reduced at the same time, which finally achieves the purpose of compressing the deep convolutional network.

#### 3.3.2. Layer Pruning

_{1}parameters of γ of each BN layer are compared and the L

_{1}parameters of γ of the 2nd convolutional kernel BN layer in these 22 shortcut structures are ranked. The second BN layer s for the i-th shortcut structure can be expressed as: ${\mathrm{L}}_{\mathrm{i}}=\sum _{\mathrm{s}=1}^{{\mathrm{N}}_{\mathrm{c}}}\left|{\mathsf{\gamma}}_{\mathrm{s}}\right|$, where: ${\mathrm{N}}_{\mathrm{c}}$ denotes the number of channels. ${\mathrm{L}}_{\mathrm{i}}$ is the L

_{l}norm of the second BN layer y parameter of the i-th Shortcut, which indicates the magnitude of the importance of the Shortcut structure. Next, take the smaller L

_{n}(n is the number of shortcut structures to be subtracted), the shortcut structure is cut out accordingly, where each shortcut structure includes two convolutional layers (1 × 1 convolutional layer and 3 × 3 convolutional layer), hereby the corresponding 2 × L

_{n}convolutional layers are clipped. Figure 7 depicts the layers before and after pruning.

#### 3.3.3. Combination of Layer Pruning and Channel Pruning

#### 3.4. Experimental Evaluation Index

## 4. Experimental Results

#### 4.1. Result Analysis

#### 4.1.1. Experiment and Analysis

#### 4.1.2. Comparison of Different Networks

#### 4.1.3. Comparative Analysis of Different Pruning Strategies

#### 4.2. Discussion

## 5. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

**Figure 4.**YOLOv3 structure. CBL is the smallest component of the YOLOv3 network architecture, which consists of Conv (convolution) + BN + Leaky relu; Res unit is the residual component; ResX, X stands for number, there are Res1, Res2, …, Res8, etc., which consists of a CBL and N residual components.

**Figure 5.**Clustering results of anchor boxes. The red pentagrams in the figure are the anchor frame scales after clustering, while the green pentagrams represent the anchor frame scales before clustering. Set the anchor boxes of the collected sheep face dataset after clustering to (41, 74), (56, 104), (71, 119), (79, 146), (99, 172), (107, 63), (119, 220), (156, 280), (206, 120).

**Figure 6.**Channel pruning principle. ${\mathrm{x}}_{\mathrm{i}}$ denotes the i-th convolutional layer of the network; ${\mathrm{h}}_{\mathrm{i}},{\mathrm{w}}_{\mathrm{i}}$ denote the height and width of the input feature map respectively; ${\mathrm{n}}_{\mathrm{i}}$ is the number of input channels of this convolutional layer.

Model | mAP | Precision | Recall | F1-Score | Parameters |
---|---|---|---|---|---|

Faster R-CNN | 90.20% | 80.63% | 90% | 84.00% | 108 MB |

SSD | 98.73% | 96.85% | 96.25% | 96.35% | 100 MB |

YOLOv3 | 95.30% | 82.90% | 95.70% | 88.70% | 235 MB |

YOLOv4 | 91.15% | 88.70% | 88.00% | 87.50% | 244 MB |

Model | mAP | Precision | Recall | F1-Score | Parameters | Speed |
---|---|---|---|---|---|---|

Prune_channel | 96.80% | 89.50% | 97.00% | 92.80% | 69.9 MB | 8.9 ms |

Prune_layer | 95.70% | 89.50% | 95.70% | 91.90% | 132 MB | 9.2 ms |

Prune_channel_layer | 97.20% | 89.90% | 97.50% | 93.30% | 61.5 MB | 8.7 ms |

