# G-Rep: Gaussian Representation for Arbitrary-Oriented Object Detection

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

- PointSet uses several individual points to represent the overall arbitrary-oriented object. The independent optimization between the points makes the trained detector very sensitive to isolated points, particularly for objects with large aspect ratios, because a slight deviation causes a sharp drop in the intersection-over-union (IoU) value. As shown in Figure 1a, although most of the points are predicted correctly, an outlier makes the final prediction fail. Therefore, the joint optimization loss (e.g., IoU loss [17,18,19]) based on the point set is more popular than the independent optimization loss (e.g., ${L}_{n}$ loss).
- As a special case of PointSet, QBB is defined as the four corners of a quadrilateral bounding box. In addition to the inherent problems of PointSet described above, QBB also suffers from the representation ambiguity problem [15]. Quadrilateral detection often sorts the points first (as shown in Figure 1b, represented by the green box) to facilitate point matching between the ground-truth and prediction bounding boxes to calculate the final loss. Although the red prediction box in Figure 1b does not satisfy the sorting rule and obtains a large loss value accordingly using the ${L}_{n}$ loss, this prediction is correct according to the IoU-based evaluation metric.
- OBB is the most popular choice for oriented object representation because of its simplicity and intuitiveness. However, the boundary discontinuity and square-like problem are obstacles to high-precision locating, as detailed in [5,20,21,22]. Figure 1c illustrates the boundary problem of OBB representation, considering the OpenCV acute angle definition ($\theta \in [-\pi /2,0)$) as an example [14]. The height (h) and width (w) of the box swap at the angle boundary, resulting in a sudden change in the loss value, which is coupled with the periodicity of the angle and makes regression difficult.

- To uniformly solve the different problems introduced by different representations (OBB, QBB, and PointSet), Gaussian representation (G-Rep) is proposed to construct the Gaussian distribution using the MLE algorithm.
- To achieve an effective and robust measurement for the Gaussian distribution, three statistical distances, the Kullback–Leibler divergence (KLD) [25], the Bhattacharyya distance (BD) [26], and the Wasserstein distance (WD) [27], are explored and corresponding regression loss functions are designed and analyzed.
- To realize the consistency in measurement between sample selection and loss regression, fixed and dynamic label assignment strategies are constructed based on a Gaussian metric to further boost performance.
- Extensive experiments were conducted on several publicly available datasets, e.g., DOTA, HRSC2016, UCAS-AOD, and ICDAR2015, and the results demonstrated the excellent performance of the proposed techniques for arbitrary-oriented object detection.

## 2. Related Work

#### 2.1. Oriented Object Representations

#### 2.2. Regression Loss in Arbitrary-Oriented Object Detection

#### 2.3. Label Assignment Strategies

## 3. Proposed Method

#### 3.1. Object Representation Based on Gaussian Distribution

**PointSet.**RepPoints [29] is an anchor-free method, which is also a baseline method used in this paper. It is constructed with a backbone network, an initial detection head and a refined detection head. The object is represented as a set of adaptive sample points (i.e., PointSet) and the regression framework adopts deformable convolution [47] for point learning. The object is represented as PointSet R, which is defined as:

**QBB.**The baseline with QBB representation is constructed on the anchor-based method Cas-RetinaNet proposed in [15], which contains a backbone network and two detection heads. QBB is defined as the four corner points of the object ($Q={\left\{\left({x}_{i}^{q},{y}_{i}^{q}\right)\right\}}_{i=1}^{4}$). Note that the four corner points of QBB must be sorted in advance to match the corners of the given ground truth representation one-to-one for regression in the original QBB baseline. Additionally, from the definitions of the three representations, we can deduce that QBB can be regarded as a special case of PointSet, and OBB can be regarded as a special case of QBB. Therefore, constructing the Gaussian distribution for PointSet is extremely generalized, which is also the major focus of this paper.

**Transformation between PointSet/QBB and G-Rep.**Considering $({x}_{i},{y}_{i})$ as a two-dimensional (2-D) variable ${x}_{i}$, its probability density under the Gaussian distribution $\mathcal{N}(\mu ,\mathsf{\Sigma})$ is defined as

**Transformation between OBB and G-Rep.**In previous studies, such as [5], a 2-D Gaussian distribution for OBB is constructed by a matrix transformation. There are two Gaussian transformation methods adopted in this paper: MLE and matrix transformation. The former can be used for the conversion of all representations (OBB/QBB/PointSet), but is inefficient and inaccurate. The latter is more precise, but only supports OBB. Therefore, when transforming ground truth, matrix transformation is chosen to avoid unnecessary bias.

#### 3.2. Gaussian Distance Metrics

**Kullback–Leibler Divergence (KLD) [25].**The KLD between two Gaussian distributions is defined as

**Bhattacharyya Distance (BD) [26].**The BD between two Gaussian distributions is defined as

**Wasserstein Distance (WD) [27].**The WD between two Gaussian distributions is defined as

#### 3.3. Regression Loss Based on Gaussian Metric

#### 3.4. Label Assignment Based on Gaussian Metric

**Fixed G-Rep Label Assignment.**The range of the IoU value is $\left[0,1\right]$ according to the definition of IoU, and the threshold values are selected empirically in the range $\left[0.3,0.7\right]$. However, this strategy is clearly not applicable to the Gaussian distribution distance calculated by the three metrics described in Section 3.2, whose value ranges are not closed intervals. Along with the concept of G-Rep regression loss design, normalized functions for each distance evaluation metric are adopted. The general form of the normalized metric for KLD, BD, and WD used in the label assignment process is defined as

**Dynamic G-Rep Label Assignment.**Dynamic G-Rep label assignment strategies are devised based on the three distance metrics in Section 3.2 to avoid the difficulty of selecting the optimal hyper-parameters. Inspired by ATSS [36], the threshold for selecting positive and negative samples is calculated dynamically according to the statistical characteristics of all the normalized distances (calculated in Equation (8)). For the i-th ground truth, the dynamic threshold $\mathcal{T}$ is calculated as

## 4. Experiments

#### 4.1. Datasets and Implementation Details

#### 4.2. Normalized Function Design

#### 4.3. Ablation Study

**Analysis of regression loss based on G-Rep.**Even if only the GIoU was replaced by ${\mathcal{L}}_{\mathrm{KLD}}$, the performance of G-Rep was better than that of PointSet (

**64.63%**vs.

**63.97%**). The dynamic label assignment strategies avoid the influence of unsuitable hyper-parameters for a fair comparison of the GIoU and Gaussian regression loss. Additionally, the superiority of G-Rep was clearly demonstrated when the dynamic label assignment strategies were used. ${\mathcal{L}}_{\mathrm{KLD}}$ still surpassed the GIoU with the same dynamic label assignment strategy ATSS on DOTA (

**70.45%**vs.

**68.88%**).

**Analysis of label assignment based on G-Rep.**The label assignment strategy is another important factor for high detection performance. For the ${\mathcal{L}}_{\mathrm{KLD}}$ loss, Table 2 shows the detection results of the different label assignment strategies. Using KLD resulted in better performance than using IoU as a metric of the label assignment, which demonstrates the effectiveness of aligning the label assignment and regression loss metrics. The optimal fixed negative and positive thresholds for selecting samples are difficult to select, whereas dynamic label assignment strategies avoid this issue. PATSS denotes the combination of the ATSS [36] and PAA [37] strategies. The mAP further reached

**70.45%**and

**72.08%**under the more robust dynamic selection strategies ATSS and PATSS, respectively. Without additional features, the combination of the dynamic label assignment strategy and regression loss increased the mAP by

**8.11%**compared with the baseline method.

**Analysis of the advantages for an object with a large aspect ratio.**Outliers often cause more serious location errors for objects with large aspect ratios than for square objects. Table 3 shows that G-Rep was more effective than PointSet for the objects with a large aspect ratio, where the mAP increased by

**6.18%**for the five typical categories with narrow objects on DOTA because G-Rep was not sensitive to isolated points.

**Comparison of different Gaussian distance metrics.**Table 4 compares the performances when different evolution metrics, KLD, WD and BD, were used in fixed and dynamic label assignment strategies and regression loss. The performances based on fixed label assignment strategies varied greatly as a result of the hand-crafted hyper-parameters. Therefore, experiments based on dynamic label assignment strategies were constructed to objectively compare the performances of the metrics. The experimental results demonstrate that the overall performance of the G-Rep loss functions surpassed that of the GIoU loss. There were tolerable performance differences between BD and the other two losses, and a slight difference (within 0.5%) between KLD and WD. To further explore whether KLD and WD are more suitable as the regression loss than BD, the label assignment metrics were unified as KLD (rows 5, 8 and 9) for the ablation study of the loss functions. In fact, all three G-Rep losses outperformed the baseline (RepPoints) [29]. There were slight differences between them in detection performance.

**Table 4.**Comparison of the three Gaussian distances as metrics for label assignment and regression loss on HRSC2016.

Rep. | $\mathcal{S}$ | $\mathcal{L}$ | mAP (%) |
---|---|---|---|

PointSet | IoU (ATSS) | GIoU | 78.07 |

G-Rep | ${\mathcal{S}}_{\mathrm{KLD}}$ (Max) | ${\mathcal{L}}_{\mathrm{KLD}}$ | 73.44 |

${\mathcal{S}}_{\mathrm{BD}}$ (Max) | ${\mathcal{L}}_{\mathrm{BD}}$ | 46.71 | |

${\mathcal{S}}_{\mathrm{WD}}$ (Max) | ${\mathcal{L}}_{\mathrm{WD}}$ | 84.39 | |

${\mathcal{S}}_{\mathrm{KLD}}$ (ATSS) | ${\mathcal{L}}_{\mathrm{KLD}}$ | 88.06 | |

${\mathcal{S}}_{\mathrm{BD}}$ (ATSS) | ${\mathcal{L}}_{\mathrm{BD}}$ | 85.32 | |

${\mathcal{S}}_{\mathrm{WD}}$ (ATSS) | ${\mathcal{L}}_{\mathrm{WD}}$ | 88.56 | |

${\mathcal{S}}_{\mathrm{KLD}}$ (ATSS) | ${\mathcal{L}}_{\mathrm{BD}}$ | 88.90 | |

${\mathcal{S}}_{\mathrm{KLD}}$ (ATSS) | ${\mathcal{L}}_{\mathrm{WD}}$ | 88.80 | |

${\mathcal{S}}_{\mathrm{BD}}$ (ATSS) | ${\mathcal{L}}_{\mathrm{KLD}}$ | 85.32 | |

${\mathcal{S}}_{\mathrm{BD}}$ (ATSS) | ${\mathcal{L}}_{\mathrm{WD}}$ | 85.28 |

**Ablation study on various datasets.**Table 5 shows the experimental results of G-Rep using two baselines on various datasets. The QBB baseline adopted the anchor-based method Cas-RetinaNet [15] (i.e., the cascaded RetinaNet [35]). G-Rep resulted in varying degrees of improvement on the anchor-based baseline with QBB and the anchor-free baseline with PointSet on various datasets.

**Elongated objects**. On the datasets containing a large number of elongated objects (e.g., HRSC2016, ICDAR2015), the improvement in G-Rep applied to PointSet was more pronounced than that applied to QBB, as shown in Table 3, mainly because the greater the number of points, the more accurate the Gaussian distribution obtained, and, thus, the more accurate the representation of the elongated object.**Size of dataset**. The performance on the small datasets (e.g., UCAS-AOD) tended to be saturated, so the improvement was relatively small.**High baseline**. Models with a high-performance baseline were hard to improve significantly (e.g., HRSC2016-QBB, DOTA-PointSet).

#### 4.4. Time Cost Analysis

#### 4.5. Comparison with Other Methods

#### 4.6. Visualization Analysis

#### 4.7. More Discussion

## 5. Conclusions

- G-Rep uses Gaussian representation to alleviate the challenges posed by other common representations. The experimental results in Table 5 show that G-Rep resulted in a substantial increase of up to 9.99% of mAP on the HRSC2016 dataset when applied to PointSet.
- G-Rep uses the normalized Gaussian distance for the regression loss function and a label assignment strategy instead of the IoU-based metric, which resulted in significant increases in mAP, up to 3.20% on the DOTA dataset and 11.07% on the HRSC2016 dataset, as shown in Table 2.
- G-Rep utilizes a Gaussian distribution to guide the regression of points in PointSet and QBB, which makes the detection results less sensitive to outliers and more accurate for elongated objects. As shown in Table 3, G-Rep resulted in a 6.18% improvement in mAP for elongated objects on the DOTA dataset.

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Azimi, S.M.; Vig, E.; Bahmanyar, R.; Körner, M.; Reinartz, P. Towards multi-class object detection in unconstrained remote sensing imagery. In Proceedings of the Asian Conference on Computer Vision, Perth, Australia, 4–6 December 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 150–165. [Google Scholar]
- Ding, J.; Xue, N.; Long, Y.; Xia, G.S.; Lu, Q. Learning RoI Transformer for Oriented Object Detection in Aerial Images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 2849–2858. [Google Scholar]
- Yang, X.; Yang, J.; Yan, J.; Zhang, Y.; Zhang, T.; Guo, Z.; Sun, X.; Fu, K. Scrdet: Towards more robust detection for small, cluttered and rotated objects. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8232–8241. [Google Scholar]
- Yang, X.; Yan, J.; Feng, Z.; He, T. R3Det: Refined Single-Stage Detector with Feature Refinement for Rotating Object. AAAI Conf. Artif. Intell.
**2021**, 35, 3163–3171. [Google Scholar] [CrossRef] - Yang, X.; Yan, J.; Qi, M.; Wang, W.; Xiaopeng, Z.; Qi, T. Rethinking Rotated Object Detection with Gaussian Wasserstein Distance Loss. In Proceedings of the International Conference on Machine Learning, Virtual Event, 18–24 July 2021. [Google Scholar]
- Han, J.; Ding, J.; Li, J.; Xia, G.S. Align deep features for oriented object detection. IEEE Trans. Geosci. Remote. Sens.
**2021**, 60, 1–11. [Google Scholar] [CrossRef] - Paolo, F.; Lin, T.T.T.; Gupta, R.; Goodman, B.; Patel, N.; Kuster, D.; Kroodsma, D.; Dunnmon, J. xView3-SAR: Detecting Dark Fishing Activity Using Synthetic Aperture Imagery. arXiv
**2022**, arXiv:2206.00897. [Google Scholar] - Ye, J.; Chen, Z.; Liu, J.; Du, B. TextFuseNet: Scene Text Detection with Richer Fused Features. IJCAI
**2020**, 20, 516–522. [Google Scholar] - Zhou, C.; Li, D.; Wang, P.; Sun, J.; Huang, Y.; Li, W. ACR-Net: Attention Integrated and Cross-Spatial Feature Fused Rotation Network for Tubular Solder Joint Detection. IEEE Trans. Instrum. Meas.
**2021**, 70, 1–12. [Google Scholar] [CrossRef] - Zolfi, A.; Amit, G.; Baras, A.; Koda, S.; Morikawa, I.; Elovici, Y.; Shabtai, A. YolOOD: Utilizing Object Detection Concepts for Out-of-Distribution Detection. arXiv
**2022**, arXiv:2212.02081. [Google Scholar] - Liu, H.; Jiao, L.; Wang, R.; Xie, C.; Du, J.; Chen, H.; Li, R. WSRD-Net: A Convolutional Neural Network-Based Arbitrary-Oriented Wheat Stripe Rust Detection Method. Front. Plant Sci.
**2022**, 13, 876069. [Google Scholar] [CrossRef] - Shi, X.; Shan, S.; Kan, M.; Wu, S.; Chen, X. Real-time rotation-invariant face detection with progressive calibration networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2295–2303. [Google Scholar]
- Chen, Z.; Chen, K.; Lin, W.; See, J.; Yu, H.; Ke, Y.; Yang, C. Piou loss: Towards accurate oriented object detection in complex environments. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 195–211. [Google Scholar]
- Yang, X.; Sun, H.; Fu, K.; Yang, J.; Sun, X.; Yan, M.; Guo, Z. Automatic ship detection in remote sensing images from google earth of complex scenes based on multiscale rotation dense feature pyramid networks. Remote Sens.
**2018**, 10, 132. [Google Scholar] [CrossRef][Green Version] - Ming, Q.; Miao, L.; Zhou, Z.; Yang, X.; Dong, Y. Optimization for Arbitrary-Oriented Object Detection via Representation Invariance Loss. IEEE Geosci. Remote. Sens. Lett.
**2022**, 19, 1–5. [Google Scholar] [CrossRef] - Guo, Z.; Liu, C.; Zhang, X.; Jiao, J.; Ji, X.; Ye, Q. Beyond Bounding-Box: Convex-Hull Feature Adaptation for Oriented and Densely Packed Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 8792–8801. [Google Scholar]
- Yu, J.; Jiang, Y.; Wang, Z.; Cao, Z.; Huang, T. Unitbox: An advanced object detection network. In Proceedings of the 24th ACM International Conference on Multimedia, Amsterdam, The Netherlands, 15–19 October 2016; pp. 516–520. [Google Scholar]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. AAAI Conf. Artif. Intell.
**2020**, 35, 12993–13000. [Google Scholar] [CrossRef] - Yang, X.; Yan, J. Arbitrary-Oriented Object Detection with Circular Smooth Label. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 677–694. [Google Scholar]
- Yang, X.; Hou, L.; Zhou, Y.; Wang, W.; Yan, J. Dense label encoding for boundary discontinuity free rotation detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 15819–15829. [Google Scholar]
- Yang, X.; Yang, X.; Yang, J.; Ming, Q.; Wang, W.; Tian, Q.; Yan, J. Learning high-precision bounding box for rotated object detection via kullback-leibler divergence. Adv. Neural Inf. Process. Syst.
**2021**, 34, 18381–18394. [Google Scholar] - Qian, W.; Yang, X.; Peng, S.; Yan, J.; Guo, Y. Learning Modulated Loss for Rotated Object Detection. AAAI Conf. Artif. Intell.
**2021**, 35, 2458–2466. [Google Scholar] [CrossRef] - Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. Proc. R. Stat. Soc.
**1977**, 39, 1–22. [Google Scholar] - Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat.
**1951**, 22, 79–86. [Google Scholar] [CrossRef] - Bhattacharyya, A. On a measure of divergence between two statistical populations defined by their probability distributions. Bull. Calcutta Math. Soc.
**1943**, 35, 99–109. [Google Scholar] - Villani, C. Optimal Transport: Old and New; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2008; Volume 338. [Google Scholar]
- Xu, Y.; Fu, M.; Wang, Q.; Wang, Y.; Chen, K.; Xia, G.S.; Bai, X. Gliding vertex on the horizontal bounding box for multi-oriented object detection. IEEE Trans. Pattern Anal. Mach. Intell.
**2020**, 43, 1452–1459. [Google Scholar] [CrossRef] - Yang, Z.; Liu, S.; Hu, H.; Wang, L.; Lin, S. Reppoints: Point set representation for object detection. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9657–9666. [Google Scholar]
- Zhou, L.; Wei, H.; Li, H.; Zhang, Y.; Sun, X.; Zhao, W. Arbitrary-oriented object detection in remote sensing images based on polar coordinates. IEEE Access
**2020**, 8, 223373–223384. [Google Scholar] [CrossRef] - Zhao, P.; Qu, Z.; Bu, Y.; Tan, W.; Guan, Q. Polardet: A fast, more precise detector for rotated target in aerial images. Int. J. Remote Sens.
**2021**, 42, 5821–5851. [Google Scholar] [CrossRef] - Wei, H.; Zhang, Y.; Chang, Z.; Li, H.; Wang, H.; Sun, X. Oriented objects as pairs of middle lines. ISPRS J. Photogramm. Remote Sens.
**2020**, 169, 268–279. [Google Scholar] [CrossRef] - Llerena, J.M.; Zeni, L.F.; Kristen, L.N.; Jung, C. Gaussian Bounding Boxes and Probabilistic Intersection-over-Union for Object Detection. arXiv
**2021**, arXiv:2106.06072. [Google Scholar] - Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, Canada, 7–12 December 2015; pp. 91–99. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Zhang, S.; Chi, C.; Yao, Y.; Lei, Z.; Li, S.Z. Bridging the Gap Between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9756–9765. [Google Scholar] [CrossRef]
- Kim, K.; Lee, H.S. Probabilistic Anchor Assignment with IoU Prediction for Object Detection. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 355–371. [Google Scholar]
- Ming, Q.; Zhou, Z.; Miao, L.; Zhang, H.; Li, L. Dynamic Anchor Learning for Arbitrary-Oriented Object Detection. AAAI Conf. Artif. Intell.
**2021**, 35, 2355–2363. [Google Scholar] [CrossRef] - Zhang, X.; Wan, F.; Liu, C.; Ji, X.; Ye, Q. Learning to Match Anchors for Visual Object Detection. IEEE Trans. Pattern Anal. Mach. Intell.
**2021**, 44, 3096–3109. [Google Scholar] [CrossRef] [PubMed] - Wang, J.; Gong, Z.; Liu, X.; Guo, H.; Yu, D.; Ding, L. Object Detection Based on Adaptive Feature-Aware Method in Optical Remote Sensing Images. Remote Sens.
**2022**, 14, 3616. [Google Scholar] [CrossRef] - Wang, J.; Cui, Z.; Zang, Z.; Meng, X.; Cao, Z. Absorption Pruning of Deep Neural Network for Object Detection in Remote Sensing Imagery. Remote Sens.
**2022**, 14, 6245. [Google Scholar] [CrossRef] - Zhang, T.; Zhuang, Y.; Wang, G.; Dong, S.; Chen, H.; Li, L. Multiscale Semantic Fusion-Guided Fractal Convolutional Object Detection Network for Optical Remote Sensing Imagery. IEEE Trans. Geosci. Remote Sens.
**2022**, 60, 1–20. [Google Scholar] [CrossRef] - Ma, W.; Li, N.; Zhu, H.; Jiao, L.; Tang, X.; Guo, Y.; Hou, B. Feature Split–Merge–Enhancement Network for Remote Sensing Object Detection. IEEE Trans. Geosci. Remote Sens.
**2022**, 60, 1–17. [Google Scholar] [CrossRef] - Yu, D.; Ji, S. A New Spatial-Oriented Object Detection Framework for Remote Sensing Images. IEEE Trans. Geosci. Remote Sens.
**2022**, 60, 1–16. [Google Scholar] [CrossRef] - Li, X.; Deng, J.; Fang, Y. Few-Shot Object Detection on Remote Sensing Images. IEEE Trans. Geosci. Remote Sens.
**2022**, 60, 1–14. [Google Scholar] [CrossRef] - Richards, F.S. A method of maximum-likelihood estimation. J. R. Stat. Soc. Ser. B Methodol.
**1961**, 23, 469–475. [Google Scholar] [CrossRef] - Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar]
- Xia, G.S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3974–3983. [Google Scholar]
- Liu, Z.; Yuan, L.; Weng, L.; Yang, Y. A high resolution optical satellite image dataset for ship recognition and some new baselines. In Proceedings of the International Conference on Pattern Recognition Applications and Methods, Porto, Portugal, 24–26 February 2017; Volume 2, pp. 324–331. [Google Scholar]
- Zhu, H.; Chen, X.; Dai, W.; Fu, K.; Ye, Q.; Jiao, J. Orientation robust object detection in aerial images using deep convolutional neural network. In Proceedings of the 2015 IEEE International Conference on Image Processing, Quebec City, QC, Canada, 27–30 September 2015; pp. 3735–3739. [Google Scholar]
- Karatzas, D.; Gomez-Bigorda, L.; Nicolaou, A.; Ghosh, S.; Bagdanov, A.; Iwamura, M.; Matas, J.; Neumann, L.; Chandrasekhar, V.R.; Lu, S.; et al. ICDAR 2015 competition on robust reading. In Proceedings of the 2015 13th International Conference on Document Analysis and Recognition, Tunis, Tunisia, 23–26 August 2015; pp. 1156–1160. [Google Scholar]
- Chen, K.; Wang, J.; Pang, J.; Cao, Y.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Xu, J.; et al. MMDetection: Open MMLab Detection Toolbox and Benchmark. arXiv
**2019**, arXiv:1906.07155. [Google Scholar] - He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1492–1500. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv
**2018**, arXiv:1804.02767. [Google Scholar] - Newell, A.; Yang, K.; Deng, J. Stacked hourglass networks for human pose estimation. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 483–499. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Li, W.; Wei, W.; Zhang, L. GSDet: Object Detection in Aerial Images Based on Scale Reasoning. IEEE Trans. Image Process.
**2021**, 30, 4599–4609. [Google Scholar] [CrossRef] [PubMed] - Li, Y.; Huang, Q.; Pei, X.; Jiao, L.; Shang, R. Radet: Refine feature pyramid network and multi-layer attention network for arbitrary-oriented object detection of remote sensing images. Remote Sens.
**2020**, 12, 389. [Google Scholar] [CrossRef][Green Version] - Zhang, G.; Lu, S.; Zhang, W. Cad-net: A context-aware detection network for objects in remote sensing imagery. IEEE Trans. Geosci. Remote Sens.
**2019**, 57, 10015–10024. [Google Scholar] [CrossRef][Green Version] - Wang, Y.; Zhang, Y.; Zhang, Y.; Zhao, L.; Sun, X.; Guo, Z. SARD: Towards scale-aware rotated object detection in aerial imagery. IEEE Access
**2019**, 7, 173855–173865. [Google Scholar] [CrossRef] - Li, C.; Xu, C.; Cui, Z.; Wang, D.; Zhang, T.; Yang, J. Feature-attentioned object detection in remote sensing imagery. In Proceedings of the 2019 IEEE International Conference on Image Processing, Taipei, Taiwan, 22–25 September 2019; pp. 3886–3890. [Google Scholar]
- Yang, F.; Li, W.; Hu, H.; Li, W.; Wang, P. Multi-Scale Feature Integrated Attention-Based Rotation Network for Object Detection in VHR Aerial Images. Sensors
**2020**, 20, 1686. [Google Scholar] [CrossRef] [PubMed][Green Version] - Wang, J.; Yang, W.; Li, H.C.; Zhang, H.; Xia, G.S. Learning center probability map for detecting objects in aerial images. IEEE Trans. Geosci. Remote Sens.
**2020**, 59, 4307–4323. [Google Scholar] [CrossRef] - Song, Q.; Yang, F.; Yang, L.; Liu, C.; Hu, M.; Xia, L. Learning point-guided localization for detection in remote sensing images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.
**2020**, 14, 1084–1094. [Google Scholar] [CrossRef] - Yang, X.; Yan, J.; Yang, X.; Tang, J.; Liao, W.; He, T. SCRDet++: Detecting Small, Cluttered and Rotated Objects via Instance-Level Feature Denoising and Rotation Loss Smoothing. arXiv
**2020**, arXiv:2004.13316. [Google Scholar] [CrossRef] - Dai, P.; Yao, S.; Li, Z.; Zhang, S.; Cao, X. ACE: Anchor-Free Corner Evolution for Real-Time Arbitrarily-Oriented Object Detection. IEEE Trans. Image Process.
**2022**, 31, 4076–4089. [Google Scholar] [CrossRef] - Yu, F.; Wang, D.; Shelhamer, E.; Darrell, T. Deep layer aggregation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2403–2412. [Google Scholar]
- Yi, J.; Wu, P.; Liu, B.; Huang, Q.; Qu, H.; Metaxas, D. Oriented object detection in aerial images with box boundary-aware vectors. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2021; pp. 2150–2159. [Google Scholar]
- Pan, X.; Ren, Y.; Sheng, K.; Dong, W.; Yuan, H.; Guo, X.; Ma, C.; Xu, C. Dynamic refinement network for oriented and densely packed object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11207–11216. [Google Scholar]
- Hou, L.; Lu, K.; Xue, J. Refined One-Stage Oriented Object Detection Method for Remote Sensing Images. IEEE Trans. Image Process.
**2022**, 31, 1545–1558. [Google Scholar] [CrossRef] [PubMed] - Huang, Z.; Li, W.; Xia, X.G.; Tao, R. A General Gaussian Heatmap Label Assignment for Arbitrary-Oriented Object Detection. IEEE Trans. Image Process.
**2022**, 31, 1895–1910. [Google Scholar] [CrossRef] [PubMed] - Ma, J.; Shao, W.; Ye, H.; Wang, L.; Wang, H.; Zheng, Y.; Xue, X. Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimed.
**2018**, 20, 3111–3122. [Google Scholar] [CrossRef]

**Figure 1.**Illustrations of different problems for different representations: (

**a**) Dissociation of PointSet; (

**b**) Representation ambiguity of QBB; (

**c**) Boundary discontinuity of OBB.

**Figure 2.**Overview of the main contributions of this paper. Gaussian distributions of QBB and PointSet are constructed, and label assignment strategies and regression losses are designed in an alignment manner based on statistical distances.

**Figure 4.**Comparison of the visualization results of PointSet and G-Rep on HRSC2016 dataset. (

**a**) PointSet. (

**b**) G-Rep.

**Figure 5.**Comparison of the visualization results of PointSet and G-Rep on the DOTA dataset. (

**a**) PointSet. (

**b**) G-Rep.

**Figure 6.**Comparison of the visualization results of PointSet and G-Rep on the UCAS-AOD dataset. (

**a**) PointSet. (

**b**) G-Rep.

**Table 1.**Experiment results of normalized function design for label assignment ($\mathcal{S}$) and regression loss ($\mathcal{L}$) on HRSC2016.

Metric | Func. of $\mathcal{S}$ | Range of $\mathcal{S}$ | Func. of $\mathcal{L}$ | Range of $\mathcal{L}$ | mAP (%) |
---|---|---|---|---|---|

KLD | $\frac{1}{2+{D}_{K}}$ | $(0,0.5]$ | $1-\frac{1}{0+exp\left(\sqrt{{D}_{K}}\right)}$ | $[0,1)$ | 87.32 |

$\frac{1}{2+{D}_{K}}$ | $(0,0.5]$ | $1-\frac{1}{0+exp\left({D}_{K}^{2}\right)}$ | $[0,1)$ | 50.73 | |

$\frac{1}{2+{D}_{K}}$ | $(0,0.5]$ | $1-\frac{1}{2+\sqrt{{D}_{K}}}$ | $[0.5,1)$ | 88.06 | |

$\frac{1}{1+{\left(\sqrt{{D}_{K}}\right)}^{3}}$ | $(0,1]$ | $1-\frac{1}{2+\sqrt{{D}_{K}}}$ | $[0.5,1)$ | 87.96 | |

BD | $\frac{1}{1+{D}_{B}^{2}}$ | $(0,1]$ | $1-\frac{1}{1+{D}_{B}^{2}}$ | $[0,1)$ | 81.02 |

$\frac{1}{1+{D}_{B}^{2}}$ | $(0,1]$ | $1-\frac{1}{1+\sqrt{{D}_{B}}}$ | $[0,1)$ | 69.32 | |

$\frac{1}{1+{D}_{B}^{2}}$ | $(0,1]$ | $1-\frac{1}{1+{D}_{B}}$ | $[0,1)$ | 85.32 | |

$\frac{1}{1+{D}_{B}}$ | $(0,1]$ | $1-\frac{1}{1+{D}_{B}}$ | $[0,1)$ | 85.12 | |

WD | $\frac{1}{2+{D}_{W}}$ | $(0,0.5]$ | $1-\frac{1}{2+\sqrt{{D}_{W}}}$ | $[0.5,1)$ | 87.04 |

$\frac{1}{2+{D}_{W}}$ | $(0,0.5]$ | $1-\frac{1}{0+exp\left(\sqrt{{D}_{W}}\right)}$ | $[0,1)$ | 88.24 | |

$\frac{1}{2+{D}_{W}}$ | $(0,0.5]$ | $1-\frac{1}{1+log(1+{D}_{W})}$ | $[0,1)$ | 88.56 | |

$\frac{1}{2+\sqrt{{D}_{W}}}$ | $(0,0.5]$ | $1-\frac{1}{1+log(1+{D}_{W})}$ | $[0,1)$ | 87.54 |

**Table 2.**Ablation study of G-Rep for PointSet on DOTA and HRSC2016. $\mathcal{S}$ and $\mathcal{L}$ represent the label assignment strategy and regression loss function, respectively.

Dataset | Rep. | $\mathcal{S}$ | $\mathcal{L}$ | mAP (%) |
---|---|---|---|---|

DOTA | PointSet | IoU (Max) | GIoU | 63.97 |

G-Rep | IoU (Max) | ${\mathcal{L}}_{\mathrm{KLD}}$ | 64.63 (+0.66) | |

${\mathcal{S}}_{\mathrm{KLD}}$ (Max) | ${\mathcal{L}}_{\mathrm{KLD}}$ | 65.07 (+1.10) | ||

PointSet | IoU (ATSS) | GIoU | 68.88 | |

G-Rep | ${\mathcal{S}}_{\mathrm{KLD}}$ (ATSS) | ${\mathcal{L}}_{\mathrm{KLD}}$ | 70.45 (+1.57) | |

${\mathcal{S}}_{\mathrm{KLD}}$ (PATSS) | ${\mathcal{L}}_{\mathrm{KLD}}$ | 72.08 (+3.20) | ||

HRSC2016 | PointSet | IoU (ATSS) | GIoU | 78.07 |

G-Rep | ${\mathcal{S}}_{\mathrm{KLD}}$ (ATSS) | ${\mathcal{L}}_{\mathrm{KLD}}$ | 88.06 (+9.99) | |

${\mathcal{S}}_{\mathrm{KLD}}$ (PATSS) | ${\mathcal{L}}_{\mathrm{KLD}}$ | 89.15 (+11.07) |

**Table 3.**Performance comparison of PointSet and G-Rep for large aspect ratio objects. The ratio number in the parentheses next to the category name in the first row is the mean aspect ratio (ratio of the long side to the short side) of all targets in that category.

Rep. | BR (2.93) | SV (1.72) | LV (3.45) | SH (2.40) | HC (2.34) | mAP (%) |
---|---|---|---|---|---|---|

PointSet | 46.87 | 77.10 | 71.65 | 83.71 | 32.93 | 62.45 |

G-Rep | 50.82 | 79.33 | 75.07 | 87.32 | 50.63 | 68.63 |

(+3.95) | (+2.23) | (+3.51) | (+3.61) | (+17.70) | (+6.18) |

**Table 5.**Ablation study of G-Rep for QBB representations on various datasets. The regression loss of G-Rep is the ${\mathcal{L}}_{\mathrm{KLD}}$. “*” denotes that dynamic ATSS-based strategies are adopted.

Dataset | Rep. | Eval. | Gain ↑ |
---|---|---|---|

DOTA | PointSet * | 68.88 | – |

G-Rep * (PointSet) | 70.45 | +1.57 | |

QBB | 63.05 | – | |

G-Rep (QBB) | 67.92 | +4.87 | |

HRSC2016 | PointSet * | 78.07 | – |

G-Rep * (PointSet) | 88.06 | +9.99 | |

QBB | 87.70 | – | |

G-Rep (QBB) | 88.01 | +0.31 | |

UCAS-AOD | PointSet * | 90.15 | – |

G-Rep * (PointSet) | 90.20 | +0.05 | |

QBB | 88.50 | – | |

G-Rep (QBB) | 88.82 | +0.32 | |

ICDAR2015 | PointSet * | 76.20 | – |

G-Rep * (PointSet) | 81.30 | +5.10 | |

QBB | 75.10 | – | |

G-Rep (QBB) | 75.83 | +0.73 |

Method | mAP (%) | Params | Speed |
---|---|---|---|

RepPoints | 70.39 | 36.1M | 24.0fps |

S${}^{2}$ANet | 74.12 | 37.3M | 19.9fps |

G-Rep (ours) | 75.56 | 36.1M | 19.3fps |

**Table 7.**Comparison of various detectors of $\mathrm{mAP}$ values on the OBB-based task of the DOTA-v1.0. “MS” indicates multi-scale training.

Method | Backbone | MS | PL | BD | BR | GTF | SV | LV | SH | TC | BC | ST | SBF | RA | HA | SP | HC | mAP (%) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

two-stage: | ||||||||||||||||||

ICN [1] | R-101 | ✓ | 81.40 | 74.30 | 47.70 | 70.30 | 64.90 | 67.80 | 70.00 | 90.80 | 79.10 | 78.20 | 53.60 | 62.90 | 67.00 | 64.20 | 50.20 | 68.20 |

GSDet [59] | R-101 | 81.12 | 76.78 | 40.78 | 75.89 | 64.50 | 58.37 | 74.21 | 89.92 | 79.40 | 78.83 | 64.54 | 63.67 | 66.04 | 58.01 | 52.13 | 68.28 | |

RADet [60] | RX-101 | ✓ | 79.45 | 76.99 | 48.05 | 65.83 | 65.45 | 74.40 | 68.86 | 89.70 | 78.14 | 74.97 | 49.92 | 64.63 | 66.14 | 71.58 | 62.16 | 69.06 |

RoI-Transformer [2] | R-101 | ✓ | 88.64 | 78.52 | 43.44 | 75.92 | 68.81 | 73.68 | 83.59 | 90.74 | 77.27 | 81.46 | 58.39 | 53.54 | 62.83 | 58.93 | 47.67 | 69.56 |

CAD-Net [61] | R-101 | 87.80 | 82.40 | 49.40 | 73.50 | 71.10 | 63.50 | 76.70 | 90.90 | 79.20 | 73.30 | 48.40 | 60.90 | 62.00 | 67.00 | 62.20 | 69.90 | |

SCRDet [3] | R-101 | ✓ | 89.98 | 80.65 | 52.09 | 68.36 | 68.36 | 60.32 | 72.41 | 90.85 | 87.94 | 86.86 | 65.02 | 66.68 | 66.25 | 68.24 | 65.21 | 72.61 |

SARD [62] | R-101 | 89.93 | 84.11 | 54.19 | 72.04 | 68.41 | 61.18 | 66.00 | 90.82 | 87.79 | 86.59 | 65.65 | 64.04 | 66.68 | 68.84 | 68.03 | 72.95 | |

FADet [63] | R-101 | ✓ | 90.21 | 79.58 | 45.49 | 76.41 | 73.18 | 68.27 | 79.56 | 90.83 | 83.40 | 84.64 | 53.40 | 65.42 | 74.17 | 69.69 | 64.86 | 73.28 |

MFIAR-Net[64] | R-152 | ✓ | 89.62 | 84.03 | 52.41 | 70.30 | 70.13 | 67.64 | 77.81 | 90.85 | 85.40 | 86.22 | 63.21 | 64.14 | 68.31 | 70.21 | 62.11 | 73.49 |

Gliding Vertex [28] | R-101 | 89.64 | 85.00 | 52.26 | 77.34 | 73.01 | 73.14 | 86.82 | 90.74 | 79.02 | 86.81 | 59.55 | 70.91 | 72.94 | 70.86 | 57.32 | 75.02 | |

CenterMap [65] | R-101 | ✓ | 89.83 | 84.41 | 54.60 | 70.25 | 77.66 | 78.32 | 87.19 | 90.66 | 84.89 | 85.27 | 56.46 | 69.23 | 74.13 | 71.56 | 66.06 | 76.03 |

CSL (FPN-based) [20] | R-152 | ✓ | 90.25 | 85.53 | 54.64 | 75.31 | 70.44 | 73.51 | 77.62 | 90.84 | 86.15 | 86.69 | 69.60 | 68.04 | 73.83 | 71.10 | 68.93 | 76.17 |

RSDet [23] | R-152 | ✓ | 89.93 | 84.45 | 53.77 | 74.35 | 71.52 | 78.31 | 78.12 | 91.14 | 87.35 | 86.93 | 65.64 | 65.17 | 75.35 | 79.74 | 63.31 | 76.34 |

OPLD [66] | R-101 | ✓ | 89.37 | 85.82 | 54.10 | 79.58 | 75.00 | 75.13 | 86.92 | 90.88 | 86.42 | 86.62 | 62.46 | 68.41 | 73.98 | 68.11 | 63.69 | 76.43 |

SCRDet++ [67] | R-101 | ✓ | 90.05 | 84.39 | 55.44 | 73.99 | 77.54 | 71.11 | 86.05 | 90.67 | 87.32 | 87.08 | 69.62 | 68.90 | 73.74 | 71.29 | 65.08 | 76.81 |

one-stage: | ||||||||||||||||||

P−RSDet [30] | R-101 | 89.02 | 73.65 | 47.33 | 72.03 | 70.58 | 73.71 | 72.76 | 90.82 | 80.12 | 81.32 | 59.45 | 57.87 | 60.79 | 65.21 | 52.59 | 69.82 | |

${\mathrm{O}}^{2}-\mathrm{Det}$ [32] | H-104 | 89.31 | 82.14 | 47.33 | 61.21 | 71.32 | 74.03 | 78.62 | 90.76 | 82.23 | 81.36 | 60.93 | 60.17 | 58.21 | 66.98 | 61.03 | 71.04 | |

ACE [68] | DAL34[69] | 89.50 | 76.30 | 45.10 | 60.00 | 77.80 | 77.10 | 86.50 | 90.80 | 79.50 | 85.70 | 47.00 | 59.40 | 65.70 | 71.70 | 63.90 | 71.70 | |

${\mathrm{R}}^{3}\mathrm{Det}$ [4] | R-152 | ✓ | 89.24 | 80.81 | 51.11 | 65.62 | 70.67 | 76.03 | 78.32 | 90.83 | 84.89 | 84.42 | 65.10 | 57.18 | 68.10 | 68.98 | 60.88 | 72.81 |

BBAVectors [70] | R-101 | ✓ | 88.35 | 79.96 | 50.69 | 62.18 | 78.43 | 78.98 | 87.94 | 90.85 | 83.58 | 84.35 | 54.13 | 60.24 | 65.22 | 64.28 | 55.70 | 73.32 |

DRN [71] | H-104 | ✓ | 89.71 | 82.34 | 47.22 | 64.10 | 76.22 | 74.43 | 85.84 | 90.57 | 86.18 | 84.89 | 57.65 | 61.93 | 69.30 | 69.63 | 58.48 | 73.23 |

GWD [5] | R-152 | 88.88 | 80.47 | 52.94 | 63.85 | 76.95 | 70.28 | 83.56 | 88.54 | 83.51 | 84.94 | 61.24 | 65.13 | 65.45 | 71.69 | 73.90 | 74.09 | |

RO${}^{3}$D [72] | R-101 | ✓ | 88.69 | 79.41 | 52.26 | 65.51 | 74.72 | 80.83 | 87.42 | 90.77 | 84.31 | 83.36 | 62.64 | 58.14 | 66.95 | 72.32 | 69.34 | 74.44 |

CFA [16] | R-101 | 89.26 | 81.72 | 51.81 | 67.17 | 79.99 | 78.25 | 84.46 | 90.77 | 83.40 | 85.54 | 54.86 | 67.75 | 73.04 | 70.24 | 64.96 | 75.05 | |

KLD [22] | R-50 | 88.91 | 83.71 | 50.10 | 68.75 | 78.20 | 76.05 | 84.58 | 89.41 | 86.15 | 85.28 | 63.15 | 60.90 | 75.06 | 71.51 | 67.45 | 75.28 | |

${\mathrm{S}}^{2}\mathrm{A}-\mathrm{Net}$ [6] | R-101 | 88.70 | 81.41 | 54.28 | 59.75 | 78.04 | 80.54 | 88.04 | 90.69 | 84.75 | 86.22 | 65.03 | 65.81 | 76.16 | 73.37 | 58.86 | 76.11 | |

PolarDet [31] | R-101 | ✓ | 89.65 | 87.07 | 48.14 | 70.97 | 78.53 | 80.34 | 87.45 | 90.76 | 85.63 | 86.87 | 61.64 | 70.32 | 71.92 | 73.09 | 67.15 | 76.64 |

DAL (${\mathrm{S}}^{2}\mathrm{A}-\mathrm{Net}$) [38] | R-50 | ✓ | 89.69 | 83.11 | 55.03 | 71.00 | 78.30 | 81.90 | 88.46 | 90.89 | 84.97 | 87.46 | 64.41 | 65.65 | 76.86 | 72.09 | 64.35 | 76.95 |

GGHL [73] | D-53 | 89.74 | 85.63 | 44.50 | 77.48 | 76.72 | 80.45 | 86.16 | 90.83 | 88.18 | 86.25 | 67.07 | 69.40 | 73.38 | 68.45 | 70.14 | 76.95 | |

DCL (${\mathrm{R}}^{3}\mathrm{Det}$) [21] | R-152 | ✓ | 89.26 | 83.60 | 53.54 | 72.76 | 79.04 | 82.56 | 87.31 | 90.67 | 86.59 | 86.98 | 67.49 | 66.88 | 73.29 | 70.56 | 69.99 | 77.37 |

RIDet [15] | R-50 | 89.31 | 80.77 | 54.07 | 76.38 | 79.81 | 81.99 | 89.13 | 90.72 | 83.58 | 87.22 | 64.42 | 67.56 | 78.08 | 79.17 | 62.07 | 77.62 | |

QBB (baseline) | R-50 | 77.52 | 57.38 | 37.20 | 65.97 | 56.29 | 69.99 | 70.04 | 90.31 | 81.14 | 55.34 | 57.98 | 49.88 | 56.01 | 62.32 | 58.37 | 63.05 | |

PointSet (baseline) | R-50 | 87.48 | 82.53 | 45.07 | 65.16 | 78.12 | 58.72 | 75.44 | 90.78 | 82.54 | 85.98 | 60.77 | 67.68 | 60.93 | 70.36 | 44.41 | 70.39 | |

G-Rep (QBB) | R-101 | 88.89 | 74.62 | 43.92 | 70.24 | 67.26 | 67.26 | 79.80 | 90.87 | 84.46 | 78.47 | 54.59 | 62.60 | 66.67 | 67.98 | 52.16 | 70.59 | |

G-Rep (PointSet) | R-50 | 87.76 | 81.29 | 52.64 | 70.53 | 80.34 | 80.56 | 87.47 | 90.74 | 82.91 | 85.01 | 61.48 | 68.51 | 67.53 | 73.02 | 63.54 | 75.56 | |

G-Rep (PointSet) | RX-101 | ✓ | 88.98 | 79.21 | 57.57 | 74.35 | 81.30 | 85.23 | 88.30 | 90.69 | 85.38 | 85.25 | 63.65 | 68.82 | 77.87 | 78.76 | 71.74 | 78.47 |

G-Rep (PointSet) | Swin-T | ✓ | 88.15 | 81.64 | 61.30 | 79.50 | 80.94 | 85.68 | 88.37 | 90.90 | 85.47 | 87.77 | 71.01 | 67.42 | 77.19 | 81.23 | 75.83 | 80.16 |

Method | mAP (%) |
---|---|

RoI-Transformer [2] | 86.20 |

RSDet [23] | 86.50 |

Gliding Vertex [28] | 88.20 |

BBAVectors [70] | 88.60 |

${\mathrm{R}}^{3}\mathrm{Det}$ [4] | 89.26 |

DCL [21] | 89.46 |

G-Rep (QBB) | 88.02 |

G-Rep (PointSet) | 89.46 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Hou, L.; Lu, K.; Yang, X.; Li, Y.; Xue, J.
G-Rep: Gaussian Representation for Arbitrary-Oriented Object Detection. *Remote Sens.* **2023**, *15*, 757.
https://doi.org/10.3390/rs15030757

**AMA Style**

Hou L, Lu K, Yang X, Li Y, Xue J.
G-Rep: Gaussian Representation for Arbitrary-Oriented Object Detection. *Remote Sensing*. 2023; 15(3):757.
https://doi.org/10.3390/rs15030757

**Chicago/Turabian Style**

Hou, Liping, Ke Lu, Xue Yang, Yuqiu Li, and Jian Xue.
2023. "G-Rep: Gaussian Representation for Arbitrary-Oriented Object Detection" *Remote Sensing* 15, no. 3: 757.
https://doi.org/10.3390/rs15030757