# CPS-Det: An Anchor-Free Based Rotation Detector for Ship Detection

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

- A reliable labeling method is proposed and combined with the prediction method of anchor-free.
- A better method of loss calculation is proposed for angle prediction, which makes angle prediction more accurate.
- The centerness calculation is optimized to make the weight distribution of each feature point more reasonable, and the angle information is introduced to make it fit with the predicted category and position.
- A Cascaded Positive sample Screening (CPS) scheme is proposed, which greatly improves the accuracy of anchor-free based detector.

## 2. Materials and Methods

#### 2.1. Network Structure

#### 2.1.1. Feature Extraction

#### 2.1.2. Feature Fusion

#### 2.1.3. Prediction

#### 2.1.4. Postprocessing

#### 2.2. Location Regression

#### 2.3. Positive Sample Screening

#### 2.3.1. Scale Limit of BBox

#### 2.3.2. Defined Centerness

#### 2.3.3. Feedback to Localization and Classification

#### 2.4. Loss Function

- Classification Loss$${L}_{cls}=\frac{1}{{N}_{pos}}\sum _{x,y}{L}_{fl}({p}_{(x,y)},{p}_{(x,y)}^{*})$$${N}_{pos}$ denotes the number of positive samples, ${L}_{fl}$ represents Focal Loss [23], ${p}_{(x,y)}$ is the classification scores of each feature point on each feature map, ${p}_{(x,y)}^{*}$ is the correspondence of the real information of the ground truth on the feature map.
- BBox Reg Loss$${L}_{reg}=\frac{1}{{\sum}_{x,y}{c}_{(x,y)}^{*}}\sum _{x,y}[{L}_{IOU}({t}_{(x,y)},{t}_{(x,y)}^{*})\times {c}_{(x,y)}^{*}]$$${L}_{IOU}$ is the IOU Loss [24], ${c}_{(x,y)}^{*}$ is the centerness of each feature point calculated by elliptic equipotential line. ${t}_{(x,y)}$ is the prediction of BBox. ${t}_{(x,y)}^{*}$ is BBox’s real location.
- Angle Loss$${L}_{ang}=\frac{1}{{\sum}_{x,y}{c}_{(x,y)}^{*}}\sum _{x,y}\{[{L}_{sm{l}^{*}}(r{1}_{(x,y)},r{1}_{(x,y)}^{*})+{L}_{sm{l}^{*}}(r{2}_{(x,y)},r{2}_{(x,y)}^{*})]\times {c}_{(x,y)}^{*}\}$$$r{1}_{(x,y)}$ and $r{2}_{(x,y)}$ is the two predicted ratios (Ratio1, Ratio2) used to calculate the direction angle of the target, $r{1}_{(x,y)}^{*}$ and $r{2}_{(x,y)}^{*}$ is the ratios calculated from the real angle of the target.$${L}_{sm{l}^{*}}(r,{r}^{*})=\left\{\begin{array}{cc}\frac{{(r-{r}^{*})}^{2}}{{r}^{*}+1},\hfill & |r-{r}^{*}|<1\hfill \\ |r-{r}^{*}|-0.5,\hfill & otherwise\hfill \end{array}\right.$$In CPS-Det, the predicted angle of the target is calculated by:$$\theta =arctan\left(\frac{r1\times W}{r2\times H}\right)$$W and H are the width and height of target’s BBox. Therefore, as the angle approaches ${0}^{\circ}$ or ${90}^{\circ}$, r1 or r2 approaches 0. In this case, the small error of r will also have a big impact on the calculation of the angle. To solve this problem, we improved SmoothL1 Loss from:$${L}_{sml}(r,{r}^{*})=\frac{{(r-{r}^{*})}^{2}}{2},|r-{r}^{*}|<1$$$${L}_{sm{l}^{*}}(r,{r}^{*})=\frac{{(r-{r}^{*})}^{2}}{{r}^{*}+1},|r-{r}^{*}|<1$$Because of regulation of $({r}^{*}+1)$, Angle Loss has greater weight when the target direction is horizontal or vertical. This enables the network to make better optimization of the value boundary.
- Centerness Loss${L}_{cel}(c,{c}^{*})$ is Cross entropy loss. c is the predicted centerness and ${c}^{*}$ is the real centerness.

#### 2.5. Summary of Algorithm Design

## 3. Experimental Results and Discussions

#### 3.1. Dataset and Evaluation Metrics

#### 3.2. Experimental Environment

#### 3.3. Ablation Study

- (1)
- Firstly, we experiment the effect of the limit on the regression size of the bounding box on the results. After the restriction of BBox’s scale was removed, the positive samples of each feature layer increased significantly. In the process of training, loss declines slowly. When the pre-training matrix uses ResNext50 to train the same epoch, loss does not decline to the minimum value. It also gets a bad result in validation. The experimental results are shown in the Figure 5. In this experiment, a total of 617 images were trained, and they go through ten iterations to augment the data. In the training of 50 epochs, a total of 77,000 iters were executed while batch = 4. The calculation of Loss is shown in Equation (4). All of the $\lambda $ are equal to 1. Without limiting range of BBox, it only got a AP value of 0.601, and tests showed that it did not capture the ship’s characteristics. The experiment proves that the large increase of positive samples has a negative effect when the size restriction of BBox is canceled.
- (2)
- Next, the influence of elliptic equipotential line feature screening and NASFCOS-FPN on the experimental results is tested. Recall-Precision curve are shown in Figure 6. In Figure 6, (Exp1, Exp2, Exp3, Exp4) corresponds to Table 2. By integrating the curves in the figure, we get the AP of the different methods. As shown in Table 2, elliptic equipotential line screening can improve the results. It improves the AP of baseline by 1.92%. This also reached a consistent result with the first experiment, which is that reducing the inferior positive sample points helps to improve the network’s ability to acquire target features. At the same time, we tested the effect of NASFCOS-FPN on the results. We introduce NASFCOS-FPN to optimize the feature fusion, which can enable the retained positive sample points to obtain more reliable feature information. It improves AP of baseline by 2.42%. Finally, we combine the two approaches; the AP was improved to 0.891. Furthermore, this is the complete structure of CPS-Det. Its loss curve is shown in the Figure 7. Training based on the baseline executed only 40 epochs, giving a total of about 60,000 iters.
- (3)
- The following experiment compared the results using improved scheme in the calculation of Angle Loss and using SmoothL1 Loss. According to Equation (11), we use $({r}^{*}+1)$ to control weight of Angle Loss when $|r-{r}^{*}|<1$. Experimental results have proven that we have better detection results on both horizontal and vertical ships compared to using SmoothL1 Loss directly. The results of the detection are shown in Figure 8. Figure 8a is the result of using SmoothL1 Loss, and Figure 8b is the result of using Angle Loss defined by us. We can see that in the horizontal and vertical cases, the accuracy of angle detection is improved, and the problem of missed detection caused by angle error is also improved.
- (4)
- We conducted a series of experiments as shown in Table 3 below to verify the influence of the ellipse parameter k. As k increases, the positive sample will decrease as the ellipse contracts, and the weight of all positive sample points except the center point will also decrease. AP is going to increase as k goes up, but the growth rate is also decreasing. This means that valid positive samples have been screened out, and the further increase of k will only lead to the imbalance of positive and negative samples, which finally leads to the decline of AP.

#### 3.4. Result on HRSC2016

#### 3.5. Discussion

- Under the limited scale of BBox, although the detection accuracy has been improved, the feature maps in the middle lose the chance to predict small targets and large targets. In the detection of large targets, because its aspect ratio is close to the setting of ellipse hyperparameter k, it still has good detection results. However, in small target detection, the performance of the detector is reduced.
- Although our improved Angle Loss optimizes the detection results of horizontal and vertical targets, it does not fundamentally solve this problem. This is because the angle is not predicted directly but calculated by trigonometric functions. As we approach the boundary value, the error will affect the result more than we can adjust it by weight. This can lead to angles that are not correctly predicted over a very small interval. Furthermore, what we did was compress this interval to improve the overall prediction accuracy. We tried to give more weight to the loss calculation close to the boundary value, but this would lead to the inability to find the direction of gradient descent in training.

## 4. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector; Springer: Cham, Switzerland, 2016. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell.
**2017**, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv
**2018**, arXiv:1804.02767. [Google Scholar] - Law, H.; Deng, J. CornerNet: Detecting Objects as Paired Keypoints. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 734–750. [Google Scholar]
- Zhou, X.; Zhuo, J.; Krahenbuhl, P. Bottom-Up Object Detection by Grouping Extreme and Center Points. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Zhou, X.; Wang, D.; Krhenbühl, P. Objects as Points. arXiv
**2019**, arXiv:1904.07850. [Google Scholar] - Kong, T.; Sun, F.; Liu, H.; Jiang, Y.; Shi, J. FoveaBox: Beyond Anchor-based Object Detector. IEEE Trans. Image Process.
**2020**, 29, 7389–7398. [Google Scholar] [CrossRef] - Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully Convolutional One-Stage Object Detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the European conference on computer vision (ECCV), Zürich, Switzerland, 6–12 September 2014. [Google Scholar]
- Wang, J.; Lu, C.; Jiang, W. Simultaneous Ship Detection and Orientation Estimation in SAR Images Based on Attention Module and Angle Regression. Sensors
**2018**, 18, 2851. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Chen, C.; He, C.; Hu, C.; Pei, H.; Jiao, L. MSARN: A Deep Neural Network Based on an Adaptive Recalibration Mechanism for Multiscale and Arbitrary-oriented SAR Ship Detection. IEEE Access
**2019**, 7, 159262–159283. [Google Scholar] [CrossRef] - Jiang, Y.; Zhu, X.; Wang, X.; Yang, S.; Li, W.; Wang, H.; Fu, P.; Luo, Z. R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection. arXiv
**2017**, arXiv:1706.09579. [Google Scholar] - Yang, X.; Yang, J.; Yan, J.; Zhang, Y.; Zhang, T.; Guo, Z.; Xian, S.; Fu, K.S. Towards More Robust Detection for Small, Cluttered and Rotated Objects. arXiv
**2018**, arXiv:1811.07126. [Google Scholar] - Xia, G.S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A Large-scale Dataset for Object Detection in Aerial Images. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Zhu, H.; Chen, X.; Dai, W.; Fu, K.; Ye, Q.; Jiao, J. Orientation robust object detection in aerial images using deep convolutional neural network. In Proceedings of the IEEE International Conference on Image Processing, Quebec, QC, Canada, 27–30 September 2015. [Google Scholar]
- Lin, T.Y.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Wang, N.; Gao, Y.; Chen, H.; Wang, P.; Tian, Z.; Shen, C.; Zhang, Y. NAS-FCOS: Fast neural architecture search for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 14–19 June 2020; pp. 11943–11951. [Google Scholar]
- Zoph, B.; Le, Q.V. Neural Architecture Search with Reinforcement Learning. arXiv
**2016**, arXiv:1611.01578. [Google Scholar] - Ghiasi, G.; Lin, T.Y.; Le, Q.V. NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Zhang, X.; Wang, G.; Zhu, P.; Zhang, T.; Li, C.; Jiao, L. GRS-Det: An Anchor-Free Rotation Ship Detector Based on Gaussian-Mask in Remote Sensing Images. IEEE Trans. Geosci. Remote Sens.
**2020**. [Google Scholar] [CrossRef] - Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Yu, J.; Jiang, Y.; Wang, Z.; Cao, Z.; Huang, T. Unitbox: An advanced object detection network. In Proceedings of the 24th ACM International Conference on Multimedia, Amsterdam, The Netherlands, 15–19 October 2016; pp. 516–520. [Google Scholar]
- Liu, Z.; Yuan, L.; Weng, L.; Yang, Y. A high resolution optical satellite image dataset for ship recognition and some new baselines. In Proceedings of the International Conference on Pattern Recognition Applications and Methods, Porto, Portugal, 24–26 February 2017; Volume 2, pp. 324–331. [Google Scholar]
- Xu, Y.; Fu, M.; Wang, Q.; Wang, Y.; Bai, X. Gliding vertex on the horizontal bounding box for multi-oriented object detection. IEEE Trans. Pattern Anal. Mach. Intell.
**2019**. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Yang, X.; Liu, Q.; Yan, J.; Li, A.; Zhang, Z.; Yu, G. R3Det: Refined Single-Stage Detector with Feature Refinement for Rotating Object. arXiv
**2019**, arXiv:1908.05612. [Google Scholar] - Lin, Y.; Feng, P.; Guan, J. IENet: Interacting Embranchment One Stage Anchor Free Detector for Orientation Aerial Object Detection. arXiv
**2019**, arXiv:1912.00969. [Google Scholar]

**Figure 6.**AP curve of ablation study, "Exp" represents the experimental sequence number in Table 2.

Stage | Output | ResNeXt50 |
---|---|---|

conv1 | 112 × 112 | 7 × 7, 64, stride 2 |

conv2 | 56 × 56 | 3 × 3, max pool, stride 2 |

$\left[\begin{array}{ccc}1\times 1,& 128& \\ 3\times 3,& 128& C=32\\ 1\times 1,& 256\end{array}\right]\times 3$ | ||

conv3 | 28 × 28 | $\left[\begin{array}{ccc}1\times 1,& 256& \\ 3\times 3,& 256& C=32\\ 1\times 1,& 512\end{array}\right]\times 4$ |

conv4 | 14 × 14 | $\left[\begin{array}{ccc}1\times 1,& 512& \\ 3\times 3,& 512& C=32\\ 1\times 1,& 1024\end{array}\right]\times 6$ |

conv5 | 7 × 7 | $\left[\begin{array}{ccc}1\times 1,& 1024& \\ 3\times 3,& 1024& C=32\\ 1\times 1,& 2048\end{array}\right]\times 3$ |

1 × 1 | global average pool | |

1000-d fc, softmax | ||

params | 25.0 $\times \phantom{\rule{3.33333pt}{0ex}}{10}^{6}$ |

Model | NASFCOS-FPN | Equipotential Line | AP | No. |
---|---|---|---|---|

Baseline | No | No | 0.8578 | Exp1 |

No | Yes | 0.8770 | Exp2 | |

Yes | No | 0.8820 | Exp3 | |

Yes | Yes | 0.8912 | Exp4 |

k | 1 | 2 | 4 | 6 | 8 |
---|---|---|---|---|---|

AP | 0.8820 | 0.8877 | 0.8912 | 0.8892 | 0.8764 |

Model | Backbone | Anchor-Free | AP |
---|---|---|---|

IENet | ResNet101 | Yes | 0.7501 |

FCOS (BBox) | ResNeXt50 | Yes | 0.8014 |

Gliding Vertex | ResNet101 | No | 0.8820 |

${R}^{3}$Det | ResNet101 | No | 0.8926 |

GRS-Det | ResNet50 | Yes | 0.8890 |

CPS-Det | ResNext50 | Yes | 0.8912 |

Model | AP | GPU | TFLOPS | Time | Speed |
---|---|---|---|---|---|

${R}^{3}$Det | 0.8926 | GTX1080Ti | 10.8 | 0.0833 s | 1.112 |

GRS-Det (ResNet50) | 0.8890 | GTX1080Ti | 10.8 | 0.0595 s | 1.556 |

CPS-Det | 0.8912 | RTX2080 | 9.83 | 0.0633 s | 1.615 |

CPS-Det | 0.8912 | RTX2080Ti | 13.13 | 0.0461 s | 1.652 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Yang, Y.; Pan, Z.; Hu, Y.; Ding, C.
CPS-Det: An Anchor-Free Based Rotation Detector for Ship Detection. *Remote Sens.* **2021**, *13*, 2208.
https://doi.org/10.3390/rs13112208

**AMA Style**

Yang Y, Pan Z, Hu Y, Ding C.
CPS-Det: An Anchor-Free Based Rotation Detector for Ship Detection. *Remote Sensing*. 2021; 13(11):2208.
https://doi.org/10.3390/rs13112208

**Chicago/Turabian Style**

Yang, Yi, Zongxu Pan, Yuxin Hu, and Chibiao Ding.
2021. "CPS-Det: An Anchor-Free Based Rotation Detector for Ship Detection" *Remote Sensing* 13, no. 11: 2208.
https://doi.org/10.3390/rs13112208