# Ship AIS Trajectory Clustering: An HDBSCAN-Based Approach

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Methodology

#### 2.1. Methodological Overview of the Research

#### 2.2. Ship Trajectory Data Preprocessing

#### 2.3. Hausdorff-Based Similarity Measures

#### 2.4. HDBSCAN

## 3. Model Design

#### 3.1. Definition of Ship Trajectory

#### 3.2. Trajectory Data Preprocessing

#### 3.3. Similarity Measures

#### 3.3.1. Hausdorff Distance

#### 3.3.2. A Similarity Function with Adaptive Scale Parameters

#### 3.4. Ship Trajectory Clustering with HDBSCAN

**Core distance**: the distance between the sample point and the ${K}_{th}$ nearest sample point; and (2)

**mutual reachability distance**: the value is the maximum value of the core distance of two sample points and the distance between two sample points. The mutual reachability distance can be obtained with Equation (4):

#### 3.5. Clustering Performance Metrics

#### 3.6. Design of the Algorithm

## 4. Case Study

#### 4.1. Data Processing and Similarity Measurement

#### 4.2. Hierarchical Density-Based Spatial Clustering

#### 4.3. Adaptive Determination of Clustering Numbers

## 5. Discussion

#### 5.1. Comparison with Other Clustering Algorithm

#### 5.2. Analysis of the Clustering Results

#### 5.3. Parameter Selection and Sensitivity Analysis

## 6. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

**Figure 8.**Illustration of clusters obtained with the proposed method (rows are labelled with numeric value and column is labelled with alphabets).

**Figure 13.**Results of Re-clustering of Figure 8(2a) based on HDBSCAN.

**Figure 14.**Results of Re-clustering of Figure 8(2b) based on HDBSCAN.

Item | Configuration |
---|---|

Boundary | Latitude: 30.8869° N to 31.2915° N; Longitude: 121.6321° E to 122.7477° E |

The number of research ship trajectories | 791 |

Trajectory data sources | Yangtze River Estuary, China on 1 May 2019, provided by Wuhan University of Technology |

Nr. | 1 | 2 | 3 | … | 708 | 709 | 710 |
---|---|---|---|---|---|---|---|

1 | 1 | 0.4320 | 0.2169 | … | 0.9393 | 0.9427 | 0.9190 |

2 | 0.4320 | 1 | 0.9302 | … | 0.2926 | 0.3319 | 0.2930 |

3 | 0.2169 | 0.9302 | 1 | … | 0.1194 | 0.1498 | 0.1204 |

… | … | ||||||

708 | 0.9393 | 0.2926 | 0.1194 | … | 1 | 0.9926 | 0.9556 |

709 | 0.9427 | 0.3319 | 0.1498 | … | 0.9926 | 1 | 0.9234 |

710 | 0.9190 | 0.2930 | 0.1204 | … | 0.9556 | 0.9234 | 1 |

No. | Algorithm | Description |
---|---|---|

1 | K-means | A distance-based clustering algorithm that combines simplicity and classics |

2 | spectral clustering | The algorithm evolved from graph theory has stronger adaptability to data distribution and less computation |

3 | DBSCAN | The classic density-based clustering algorithm can find clusters of arbitrary shape in noisy spatial database |

4 | HDBSCAN | A new clustering method combining density clustering and analytic hierarchy process |

