A Hybrid-Clustering Model of Ship Trajectories for Maritime Traffic Patterns Analysis in Port Area

Liu, Lei; Zhang, Yong; Hu, Yue; Wang, Yongming; Sun, Jingyi; Dong, Xiaoxiao

doi:10.3390/jmse10030342

Open AccessArticle

A Hybrid-Clustering Model of Ship Trajectories for Maritime Traffic Patterns Analysis in Port Area

by

Lei Liu

¹

,

Yong Zhang

^1,*,

Yue Hu

²,

Yongming Wang

³,

Jingyi Sun

¹ and

Xiaoxiao Dong

¹

School of Transportation, Southeast University, Nanjing 211189, China

²

College of Transportation Engineering, Tongji University, Shanghai 200070, China

³

China Transport Telecommunications & Information Center, Beijing 100011, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2022, 10(3), 342; https://doi.org/10.3390/jmse10030342

Submission received: 25 January 2022 / Revised: 14 February 2022 / Accepted: 22 February 2022 / Published: 1 March 2022

(This article belongs to the Section Ocean Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

A hybrid-clustering model is presented for the probabilistic characterization of ship traffic and anomaly detection. A hybrid clustering model was proposed to increase the efficiency of trajectory clustering in the port area and analyze the maritime traffic patterns in port. The model identified dissimilarities between trajectories based on characteristics, using K-Means and the density-based spatial clustering of applications with noise algorithm (DBSCAN). Firstly, the ship’s trajectory characteristics are constructed based on real ship trajectories considering static characteristics and dynamic characteristics of ship trajectories to calculate the characteristic dissimilarity between trajectories. Simultaneously, the spatial dissimilarity could be quantified using the Hausdorff algorithm. Then, the ship trajectory is clustered initially based on the departure and destination characteristics using K-Means algorithms to obtain various sub-trajectories. However, there are still different types of trajectories in each sub-trajectory. Thus, the DBSCAN algorithm is adopted to cluster the sub-trajectory based on the analysis of the different trajectory characteristics. Finally, the proposed model is applied to the characterization of the Zhanjiang Port, and the results show that the hybrid-clustering method can effectively cluster ship trajectory and present probabilistic characterization of ship traffic and anomaly detection. This lays a solid theoretical foundation for the supervision and risk control of intelligent ships.

Keywords:

maritime traffic; AIS data; traffic characterization; hybrid-clustering model; port area; K-Mean; DBSCAN

1. Introduction

Waterborne transportation concentrates on approximately 90% of the world’s trade [1]. In addition, more than 50,000 vessels navigate around the world each day, including shipping on the Arctic Sea Route [2]. Moving ship trajectory is an extremely valuable spatial-temporal data source that could be used to analyze the ship travel behaviors and provide empirical support for ship path planning, grounding risk analysis, anomaly detection, traffic complexity metric, etc. Above all, surveillance of ship travel behaviors is of great importance for maritime safety and security. In the maritime domain, accidents frequently occur due to high traffic density, human errors, and severe weather conditions, which cause casualties, property damage, and environmental damage [3]. To enhance navigation safety and reduce the number of maritime accidents, the maritime risk awareness alert system is being gradually developed and established. The Automatic Identification System (AIS) recognizes and locates ships by exchanging data with nearby ships, AIS base stations, and Vessel Traffic Service (VTS) base stations. Since 2004, the International Maritime Organization (IMO) has stipulated that AIS systems must be installed on cargo ships with a gross tonnage of 300 and upwards and all passenger ships during the international navigation [4]. All countries and regions have also introduced requirements for the installation of AIS base stations to improve the navigation safety of ships. With the rapid development of shipboard AIS, the widespread application of data analysis technology in recent years has provided technical support for the analysis of AIS data. AIS data is playing an increasingly important role in ship safety and maritime management [5,6,7]. In addition, AIS ship trajectory clustering, data mining, and application have become hot research topics.

AIS data has been widely used in the maritime domain, which mainly includes collision avoidance [8,9], anomaly detection [10], and traffic pattern recognition [11]. These applications could also be divided into three categories: basic applications (BA), extended applications (EA), and advanced applications (AA) [7]. Basic applications include data mining and navigation risk analysis, such as ship trajectory construction [11] and ship domain research [12,13]. Extended applications mainly refer to ship behavior analysis and environmental impact analysis, such as research on ship behavior patterns [4] and navigation emission monitoring [14]. Advanced applications mainly refer to trade analysis, ship and port performance assessment, and Arctic navigation research, such as using AIS data to analyze global trade [15] and the freight volume of seaport containers [16]. As for shipping behavior analysis based on AIS data, Zhou et al. [17] proposed a new methodology to achieve ship classification, which consists of distinguishing behavior clusters and classifying ships. Shahir et al. [18] proposed an approach to detect dark fishing by profiling and ranking fishing vessels. To learn fishing vessel routine activity patterns, Shahir et al. [19] further combined cluster methods with Hidden Markov Models to differentiate fishing trip types. In order to analyze short-to-medium-term operational risk management strategies by relative trip distance, fleet repositioning flexibility, and trading diversity, AIS data was used to extract ship trajectories to analyze tramp navigation patterns [20]. Given that the majority of historical trajectory data is unlabeled, Duan et al. [21] developed a semi-supervised deep learning approach to integrate unlabeled data knowledge for vessel trajectory classification.

Ship trajectory consists of multiple and continuous dynamic AIS points of the same ship over consecutive periods delivered from AIS data, which contains of static data and dynamic data. In terms of maritime traffic patterns analysis, ship trajectory is helpful for understanding ship behavior. However, with the increasing number of ships, the amount of ship trajectories has increased, and thus, the requirements for ship trajectory mapping and clustering, and ship anomaly recognition have gradually increased [22]. Ship traffic cluster analysis provides a basic theory for the classification of ship trajectory and anomaly recognition and has a positive effect on improving ship dynamic monitoring and maritime management capabilities.

In terms of moving trajectory clustering for maritime traffic patterns analysis, three main methods are used: K-Means, hierarchical clustering, and DBSCAN (Density-based spatial clustering of applications with noise) algorithms. A method was proposed for identifying fish activity using a high-resolution map of fishing efforts [23], and a quantitative approach is presented for delineating the principal fairways of ships [24]. However, AIS data-based clustering has become an increasingly popular method for marine traffic pattern recognition in recent years. In the clustering research, three types of ship trajectories based on form exist in the research: individual track points, sub-trajectories (line segments), and the entire original trajectory. Two main algorithms have generally been applied in previous research: classification-based (K-Means) and density-based (DBSCAN) algorithms.

Accordingly, the DBSCAN algorithm has better applicability than other algorithms because it can identify the trajectories of different shapes and has the ability to identify anomalies [25,26]. Lee et al. [27] segmented ship trajectories and clustered the sub-trajectories using the DBSCAN algorithm. However, the paper only analyzes the similarity of the sub-trajectories using static data, such as the distances between the trajectories (including vertical distance, parallel distance, and directional distance), which are more suitable for calculating the distance between straight-line segments. Zhao et al. [25] realized the separation of ship trajectories through multilayer DBSCAN clustering based on Lee’s research. The difficulty of applying the DBSCAN algorithm lies in selecting the parameters, as parameters have an essential influence on the effectiveness of the algorithm. On the one hand, there were no fixed rules for the parameter selection. Ships were free to navigate on the water’s surface; thus, the distribution of ship positions was scattered, which led to a significant difference in trajectories, and the parameter selection process was cumbersome. Li et al. [28] used the dynamic time warping (DTW) algorithm to calculate the ship’s trajectory distance and the K-Means algorithm to cluster the ship’s trajectory, but the number of trajectories was small, and the trajectory model was relatively simple. The validity of the simulation needs further verification. For the study of ship trajectory, in addition to the clustering algorithm, a classification method is also included. Sheng et al. [29] first divided ship trajectories into mooring trajectories, straight trajectories, and turning trajectories, segmented the ship’s trajectory characteristic values, and then used logistic regression to classify the trajectories.

In terms of maritime anomaly detection, DBSCAN is also commonly used to identify abnormal ship trajectories. “Ship abnormal behavior” refers to illegal, suspicious, or unsafe vessel behavior [30]. Specifically, when the ship trajectory is different from the original behavior mode or does not belong to any existing and common trajectory type, it is considered an abnormal trajectory. In ship trajectory anomaly recognition, there are mainly three types of methods, namely the unsupervised trajectory clustering method, the statistical theory method, and the neural network method [10]. R. Laxhammar [31] divided an area into grids and then used the Gaussian Mixture Model (GMM) to detect anomalies. Rhodes used neural networks to recognize ship trajectory abnormalities according to the navigation rules, but neural networks require a large number of training sets and the recognition results were not stable enough. Zhen et al. [10] constructed the trajectory space-to-course distance and used Bayesian and hierarchical clustering methods to identify abnormal ship behaviors. However, the ship behavior pattern was too simple and only included two types of trajectories that went in both directions. Fu et al. [32] proposed that anomalous identification could be carried out according to the spatiotemporal correlation of the data. In addition, Shahir et al. [18] used AIS data to analyze dark periods in ship trajectories to detect illegal fishing. Based on the previous studies and research, the following issues exist in the current research: the characteristics and similarities of ship trajectory have not been fully explored; most of the studies focus on ship trajectory in open water, where the trajectory characteristics are simple, and the number of trajectories is small, and the results of the cluster analysis lack verification; the clustering analysis of ship trajectory is not realized in various routes/voyages.

Ship traffic flow is more complex in port areas, and the status of ships is diverse. Analyzing the characteristics of marine traffic patterns in massive data is the key to improve ship monitoring in port areas. This paper proposes a hybrid clustering method based on K-Means and DBSCAN that can be used to classify ship trajectories and analyze navigation characteristics using AIS data. Firstly, the ship trajectory characteristics are analyzed considering static and dynamic data from AIS, and the similarity and dissimilarity of ship trajectories are quantified based on the ship trajectory characteristic construction and spatial distance between trajectories. Secondly, the characteristics of the ship trajectory are preliminarily clustered based on the static characteristics (departure and destination) by the K-Means algorithm, and the DBSCAN algorithm is used for sub-trajectory clustering [33,34]. Finally, the speed and course of the ship trajectories are analyzed in various routes and their abnormal behavior is recognized. This paper proposes a hybrid algorithm that realizes the clustering and analysis of ship traffic characteristics in complex port areas. The contributions are listed as follows:

(1): A ship trajectory dissimilarity metric and quantitative method are proposed based on different ship trajectory characteristics, including static characteristic dissimilarities, dynamic characteristic dissimilarities, and spatial dissimilarity;
(2): The hybrid clustering model is used to realize the division of ship trajectory, improving the efficiency of ship trajectory recognition;
(3): Based on the results of ship traffic clustering, analysis and anomaly recognition of ship behaviors can be identified and ships can be classified into various routes/voyages in the port area.

The remainder of this paper is structured as follows. The problem of ship traffic clustering is described in Section 2. The methodology and modeling of the proposed model are presented in Section 3. Section 4 and Section 5 demonstrate the clustering results of the comparison experiment associated with case studies, and the paper is concluded in Section 6.

2. Description of the Problem

2.1. Definition of Ship Trajectory

The research aims to characterize ship travel behaviors and shipping routes in the port area using AIS data. The extraction of ship travel behaviors starts by grouping ship trajectories following the same voyage and by clustering static or entry and exit points corresponding to starting and ending locations of the travel behavior. A shipping route is assumed as a set of straight segments or legs connecting waypoints delivered from AIS data, corresponding to positions of significant changes in the navigational behavior of the ships, which allows a compact representation of ship travel behaviors. Accordingly, the shipping route is digitized as

T_{r_{i}}

, seen in Figure 1, and

p_{n}

is a point in the digitized route, containing the MMSI of the ship, timestamp, position, speed, and course et al., see in Formula (1) and Table 1.

T_{r_{i}} = {p_{1}, \dots, p_{j}, \dots, p_{n}}, 1 \leq j \leq n

(1)

2.2. Ship Trajectory Clustering Procedures

The characteristics of a ship trajectory should be constructed in advance to characterize the ship travel behaviors for ship trajectory clustering and abnormal ship behavior detection in the port area. Then, we can quantify the dissimilarity of ship trajectories. Thus, the ship trajectory clustering procedures contain three main parts: ship trajectory characteristics construction, ship trajectory dissimilarity evaluation, and multi-modal ship trajectory clustering method.

(1): Construction of ship trajectory characteristics

It is important to consider the ship trajectory characteristics for clustering, such as the distance between COG, SOG, and turning angles, etc. Similar works on the construction of ship trajectory characteristics for research in marine traffic management are found in two different fields of scientific work. The distance between trajectories is a unique characteristic for clustering ship trajectories, which lacks ship trajectory feature analysis [10]. On the other hand, various trajectory characteristics are considered, including straight-sailing characteristics and turning characteristics that the ship trajectories are classified as: stop segment, line segment, and turn segment [29]. However, the dynamic situation of ships has not been fully considered, which should be analyzed for ship traffic clustering. Therefore, the mentioned methods/models above only consider the course of the ship in the clustering analysis of multiple characteristics of trajectories, and the results are not satisfied with the need for traffic clustering analysis. Thus, the multiple-dimensional characteristic extraction method for ship trajectory is urgently needed, containing all the dynamic data delivered from AIS data, such as the distance between trajectories, COG, SOG, and motion parameter variation [35], as seen in Section 3.

(2): Dissimilarity evaluation of ship trajectories

Based on the construction method of multiple-dimensional characteristics, the dissimilarity of each trajectory should be quantified and determined for marine traffic clustering. Based on the feature extraction of ship trajectory, the dissimilarity evaluation of ship trajectory should be determined. Similar works on the similarity evaluation of ship trajectories in scientific work are for two different fields. The first focus is on the distance between straight-line segments based on pattern recognition, which has limited the applicability for long and curved trajectories [27]. The following generally considers the comprehensive distance of course and spatial distance that may need more trajectory characteristics in specific traffic situations such as port areas and need to determine weights of each characteristic [10]. Thus, a novel dissimilarity evaluation of ship trajectories is proposed, focusing on the marine traffic in the port area, see in Section 3.

(3): Multi-model ship trajectory clustering method

In order to improve clustering efficiency using AIS big data, any single model can be fitted to the time requirements for the clustering of a large number of trajectories [36]. Therefore, the efficiency and effectiveness of ship trajectory clustering need to be further improved using the proposed multi-model ship trajectory clustering method, see in Section 3.

2.3. Framework of Trajectory Clustering and Anomaly Detection

To characterize the ship travel behaviors and shipping route in the port area using AIS data, a novel hybrid-clustering method is proposed in this paper. The analysis of ship travel behaviors is essential as it provides the possibility to integrate and enrich maritime traffic management, including route planning, ship trajectory prediction, and ship traffic monitoring.

Figure 2 shows the flowchart of ship trajectory clustering and maritime behavior analysis based on the hybrid clustering model, consisting of four main steps.

The basic steps of the method are summarized as follows:

Step 1—AIS pre-processing. This step mainly includes ship trajectory separation, ship abnormal data filtering, and docking trajectory filtering, as seen in Section 3.1.1. In order to obtain segmented trajectories, MMSI information is used to identify different ships, whereas TIMESTAMP information is used to split different segments of trajectories of the same ship [37]. In addition, we focus on the moving trajectories in this work, and we set the speed-based rules to filter stopping data.
Step 2—Ship trajectory characteristics extraction and dissimilarity evaluation. Based on the various ship trajectory characteristics, a construction method of multiple-dimensional characteristics is proposed and shown in Section 3.1.2 and Section 3.1.3. Accordingly, the dissimilarity of ship trajectories is evaluated based on the individual characteristic distance and comprehensive distance, as seen in Section 3.2. It is noted that most trajectory dissimilarities are based on trajectory characteristics, whereas the spatial dissimilarity involves different objects and needs to be discussed separately. Then, considering different dissimilarities and weights, the comprehensive dissimilarity is constructed to improve the recognition of trajectories [35].
Step 3—Hybrid clustering method modeling. Based on the construction method of multiple-dimensional characteristics, a hybrid clustering method is proposed based on the K-Means algorithm for voyage clustering and the DBSCAN algorithm for characteristic classification for each cluster, as seen in Section 3.3.
Step 4—Traffic characteristics analysis and abnormal behavior detection. After clustering and obtaining trajectories in different clusters, traffic characteristics, including speeding features and turning features, are analyzed on specific routes. Meanwhile, the abnormal behavior detection of abnormal ships could be achieved based on the results of DBSCAN clustering.

3. Methodology and Modeling

3.1. Ship Trajectory Characteristics Construction

3.1.1. AIS Data Reconstruction

AIS data reconstruction includes trajectory separation, abnormal data filtering, and docking trajectory filtering. It is noted that the calculation of trajectory lengths is based on longitude and latitude, and drift data will have a negative influence on the characteristics of the ship trajectories. Therefore, it is necessary to filter abnormal data that obviously drift. The AIS data should be checked and revised to enhance clustering accuracy, including longitude, latitude, speed, and course.

3.1.2. Static Characteristics of Ship Trajectory

The ship trajectory is spatiotemporal data that contains both static and dynamic parameters. Accordingly, static characteristics of ship trajectory refer to the position of the departure and destination points, and the length of ship trajectory.

(1): Feature of departure and destination of ship trajectory

According to the definition of ship trajectory in Section 2.1, the first point

p_{1}

and the last point

p_{n}

in trajectory

T_{r_{i}} = {p_{1}, \dots, p_{j}, \dots, p_{n}} (1 \leq j \leq n)

represent the departure and destination, respectively. The main difference between the departure and destination of different trajectories is the gap in the longitude and latitude. The characteristics of each voyage of departure and destination are concluded as Formula (2).

T_{s e} = {(l o n_{1}, l a t_{1}), (l o n_{n}, l a t_{n})}

(2)

The

(l o n_{1}, l a t_{1})

denotes the longitude and latitude of the departure point of an itinerary, and the

(l o n_{n}, l a t_{n})

denotes the longitude and latitude of the destination point of an itinerary.

(2): Length feature of ship trajectory

For static characteristics of ship trajectories, the length is different due to the various departures and destinations. Meanwhile, the length is also different when the route is different between the same departure and destination. To conclude the length feature of ship trajectory, the length of the ship trajectory is defined as seen in Formula (3). In addition, the length of an itinerary is calculated based on the distance of two adjacent AIS data positions, as seen in Formula (4). The

d (p_{j}, p_{j + 1})

denotes the distance between point

p_{j}

and point

p_{j + 1}

.

T_{l e n} = \sum_{j = 1}^{n - 1} d (p_{j}, p_{j + 1})

(3)

d (p_{j}, p_{j + 1}) = dis ({lon}_{j}, {lat}_{j}, {lon}_{j + 1}, {lat}_{j + 1})

(4)

3.1.3. Dynamic Characteristics of Ship Trajectory

Dynamic characteristics of ship trajectory consider ship speed, course, and the variation of these dynamic parameters during sailing [29]. Therefore, the basic characteristics of speed, course, and movement changes are mainly constructed.

(1): Central trend feature of ship trajectory

Ship speed and course are important dynamic parameters for ship trajectory clustering. Navigation speed is affected by the other ships and the hydrological conditions, such as traffic density and the direction of the waves or currents, etc. For example, the average speed of ships sailing along the current is significantly higher than against it. On the other hand, the course is determined by departure, destination, and the waypoints of the route, and the course is opposite coming and returning. Thus, the value of course is variable depending on the voyage. Accordingly, the average and the median are set as the central trend feature, as shown in Formula (5). The

S O G_{m e a n}

and

S O G_{m e d i a n}

represent the average and median values of speed, whereas the

C O G_{m e a n}

and

C O G_{m e d i a n}

are the average and median values of course, respectively

T_{c t f} = {S O G_{m e a n}, S O G_{m e d i a n}, C O G_{m e a n}, C O G_{m e d i a n}}

(5)

(2): Motion variation feature of ship trajectory

The ship navigates in real operational conditions where the dynamic parameters vary in real-time. Thus, the motion parameter variation also is an essential feature of ship trajectory. Accordingly, the motion parameter variation feature is extracted using the outputs of statistical analysis of the entire voyage, such as variable interval and standard deviation of the dynamic parameter (speed, course) during the voyage, which is constructed as shown in Formula (6). The

S O G_{r a n g e}

,

S O G_{s t d}

,

C O G_{r a n g e}

, and

C O G_{s t d}

denote variable interval and standard deviation of speed and course over ground.

T_{m v f} = {S O G_{r a n g e}, S O G_{s t d}, C O G_{r a n g e}, C O G_{s t d}}

(6)

3.2. Dissimilarity Evaluation of Ship Trajectory Characteristics

Based on the analysis and construction of the characteristics of ship trajectory in Section 3.1, a comprehensive similarity evaluation model is established in this Section, considering the characteristic dissimilarity of ship trajectory and spatial dissimilarity (distance) between the trajectories.

3.2.1. Characteristic Dissimilarity of Ship Trajectory

In the paper, the characteristic dissimilarity of ship trajectory is determined based on the trajectory characteristics illustrated in Section 3.1.2 and Section 3.1.3.

(1): The static characteristic dissimilarity of ship trajectory

Static characteristic dissimilarity considers the feature of departure and destination of ship trajectory

D_{s e}

and the length feature of ship trajectory

D_{l}

. As for

D_{s e}

, the distance of departure and destination of ship trajectory

T_{r_{i}}

and

T_{r_{i}}^{'}

is calculated respectively, then the sum is taken as feature the distance shown in Formula (7). As for

D_{l}

, the length of the ship trajectory

T_{r_{i}}

and

T_{r_{i}}^{'}

, and the absolute value of the length difference is taken as the length feature distance shown in Formula (8).

D_{s e} (T_{r_{i}}, T_{r_{i}}^{'}) = d i s (l o n_{1}^{i}, l a t_{1}^{i}, l o n_{1}^{i^{'}}, l a t_{1}^{i^{'}}) + d i s (l o n_{n}^{i}, l a t_{n}^{i}, l o n_{n}^{i^{'}}, l a t_{n}^{i^{'}})

(7)

D_{l} (T_{r_{i}}, T_{r_{i}}^{'}) = | T_{l e n}^{i} - T_{l e n}^{i^{'}} |

(8)

(2): The dynamic characteristic dissimilarity of ship trajectory

Dynamic characteristic dissimilarity considers the central trend feature and the dynamic parameter variation feature. Since the definition of speed feature and course feature is similar in Section 3.1.2, the characteristic dissimilarity is in the same way, see in Formulas (11) and (12). Taking course features as an example, the dissimilarity of the average course value

D_{C O G_m e a n}

and the dissimilarity of the median course value of

D_{C O G_m e d i a n}

are defined based on the corresponding features in the

T_{c t f}

of each voyage.

D_{C O G_m e a n} (T_{r_{i}}, T_{r_{i}}^{'}) = | C O G_{m e a n}^{i} - C O G_{m e a n}^{i^{'}} |

(9)

D_{C O G_m e d i a n} (T_{r_{i}}, T_{r_{i}}^{'}) = | C O G_{m e d i a n}^{i} - C O G_{m e d i a n}^{i^{'}} |

(10)

For the dynamic parameters variation feature dissimilarity, we consider the standard deviation of dissimilarity and the variable range of dissimilarity of the values of mentioned dynamic parameters of ship trajectory. The calculation is similar to course feature dissimilarity, and each dynamic parameter variation feature dissimilarity is calculated separately. For example, the dissimilarity based on

s o g_{s t d}

is determined in Formula (11).

D_{S O G_s t d} (T_{r_{i}}, T_{r_{i}}^{'}) = | S O G_{s t d}^{i} - S O G_{s t d}^{i^{'}} |

(11)

3.2.2. Spatial Dissimilarity (Distance) of Ship Trajectories

The mentioned dissimilarity is based on the characteristics of the ship trajectories, and the spatial dissimilarity (distance) between various voyages should also be considered. In this paper, the spatial distance refers to the spatial distance between the disjoint voyage of the ship trajectory. According to the definition of ship trajectory in Section 2.1, a ship trajectory consists of a series of points, which can be regarded as a collection of multiple points containing the MMSI of the ship, timestamp, position, speed, and course. Accordingly, the Hausdorff distance algorithm [10] could be used based on Formulas (12)–(14).

D_{h} = m a x {h (T_{r_{i}}, T_{r_{i}}^{'}), h (T_{r_{i}}^{'}, T_{r_{i}})}

(12)

h (T_{r_{i}}, T_{r_{i}}^{'}) = \max_{p_{i} \in T_{r_{i}}} (\min_{p_{i}^{'} \in T_{r_{i}}^{'}} d (p_{i}, p_{i}^{'}))

(13)

h (T_{r_{i}}^{'}, T_{r_{i}}) = \max_{p_{i}^{'} \in T_{r_{i}}^{'}} (\min_{p_{i} \in T_{r_{i}}} d (p_{i}^{'}, p_{i}))

(14)

The

h (T_{r_{i}}, T_{r_{i}}^{'})

and

h (T_{r_{i}}^{'}, T_{r_{i}})

denote the distance of trajectory,

T_{r_{i}}

to

T_{r_{i}}^{'}

, and distance of ship trajectory,

T_{r_{i}}^{'}

and

T_{r_{i}}

. Furthermore, the calculation process of

h (T_{r_{i}}, T_{r_{i}}^{'})

is illustrated using the Hausdorff distance, see in Figure 3. Accordingly, a loop is used to calculate the distance of point

p_{i}

in trajectory

T_{r_{i}}

to all points in trajectory

T_{r_{i}}^{'}

and select the minimum distance to amount them as the spatial dissimilarity (distance) of point

p_{i}

in trajectory

T_{r_{i}}

to all points in trajectory

T_{r_{i}}^{'}

.

3.2.3. Comprehensive Dissimilarity of Ship Trajectory

Based on the mentioned characteristic dissimilarity of ship trajectory, the spatial dissimilarity of ship trajectory and the comprehensive dissimilarity of ship trajectory is presented, taking part of them into consideration. According to the difference between trajectories in the application scenario, the above characteristic dissimilarity and spatial dissimilarity can be selected and combined to form a comprehensive dissimilarity between ship trajectories, as shown in Formula (15). Due to the different dimensions of characteristic dissimilarity and spatial dissimilarity, they need to be normalized using Formula (16).

D = \sum ω_{i} D_{i}^{'}, i \in {\begin{matrix} s e, l, S O G_{m e a n}, S O G_{m e d i a n}, C O G_{m e a n}, C O G_{m e d i a n}, \\ S O G_{r a n g e}, S O G_{s t d}, C O G_{r a n g e}, C O G_{s t d}, h \end{matrix}}

(15)

D_{l}^{'} = \frac{D_{l} - D_{l_m i n}}{D_{l}_m a x - D_{l_m i n}}

(16)

D

denotes the comprehensive distance of ship trajectories,

ω_{i}

indicates the weight of

i_{t h}

distance, and

D_{i}^{'}

indicates the normalized distance of

i_{t h}

distance. Taking

D_{l}

as an example of normalization,

D_{l}

is the length of distance before normalization,

D_{l}_m a x

and

D_{l_m i n}

are the maximum and minimum values in the matrix of length distance.

3.3. Hybrid Trajectory Clustering Model

As mentioned before in Section 3.1 and Section 3.2, the ship trajectory contains various dimensional information, so the general data clustering algorithms, including K-Means clustering, hierarchical agglomerative clustering, and DBSCAN clustering cannot be used directly to cluster the ship trajectories data. There are specific requirements for the ship’s trajectory characteristics. For the trajectory clustering of a specific area, the number of trajectory types can be determined according to the positions of departure and destination of trajectories. The longitude and latitude of the ship position can be averaged. Fortunately, the characteristics of the K-Means algorithm can be combined to classify the various ship trajectories based on the static characteristic of ship trajectories (see in Figure 4). After that, the delivered cluster also contains different types of ship trajectories, including trajectories along different routes and abnormal trajectories. Then, the DBSCAN algorithm is used to cluster the delivered clusters based on K-Means and identify abnormal ship trajectories.

Figure 4 illustrates the implementation process of the proposed hybrid clustering. The original data includes trajectory 1 to trajectory 8, where trajectories 3 and 4 are opposite, trajectories 1 and 2 have the same departure and destination and have a difference in their path, while trajectories 6 and 7 have a difference in trajectory length. In detail, firstly, the trajectories are classified into six categories after K-Means clustering, where trajectories 1 and 2 are in one category and trajectories 6 and 7 are in another category, based on the features of the departure and destination of the ship trajectory. Then, based on DBSCAN clustering, trajectories 1 and 2 and trajectories 6 and 7 are separated using the dynamic characteristics of ship trajectory using comprehensive dissimilarity. Finally, all types of trajectories are obtained, including abnormal ship trajectories.

In brief, based on the dissimilarity model of ship trajectories, a hybrid-clustering model ship trajectory is proposed incorporating the K-Means and DBSCAN algorithms, and the ship trajectory clustering procedure is formulated. The procedure of the ship trajectory mainly includes preprocessing and hybrid clustering. The AIS data preprocessing divides data into multiple ship trajectories and filters abnormal data; other trajectories not considered in this paper. The hybrid clustering is used to perform K-Means clustering on original trajectories and then realize sub-trajectory clustering and abnormal recognition through DBSCAN clustering. For K-Means clustering, the input includes all the original trajectories and the initial center trajectories, and only the dissimilarity of the departure and destination of ship trajectory

D_{s e}

is considered here. For DBSCAN clustering of subclass trajectories, the comprehensive dissimilarity is adopted because the difference between trajectories in a subclass can include multiple categories of dissimilarities. In fact, the filter variance method is used to determine which set of features are selected to construct the comprehensive dissimilarity. The flowchart for ship trajectory clustering and anomaly detection based on the hybrid-clustering model is shown in Section 2.3. The hybrid-clustering process is shown in Figure 5. The implementation of the K-Means and DBSCNA clustering algorithms is shown in Appendix A and Appendix B. The flowchart of the proposed procedure is shown in Figure 5.

4. Case Studies

The Zhanjiang Port in China is selected to verify the effectiveness of the proposed hybrid clustering model, and the traffic characteristics of ships incoming and outgoing from the port are analyzed. The research area and original AIS data are described in Section 4.1. To optimize clustering parameters, we introduce the clustering evaluation method in Section 4.2. The ship trajectory clustering experiments based on the K-Means and DBSCAN algorithms are performed successively in Section 4.3. Subsequently, we utilize the clustering results for the characteristic analysis of ship traffic and abnormal identification in Section 4.4.

4.1. Research Area and Data Foundation

4.1.1. Research Area

The location of Zhanjiang Port is shown in Figure 6, where the latitude and longitude are 21°11′21″ N and 110°24′21″ E, respectively. The sections from the entrance to the port are divided into the outer section of the Longteng Channel, the inner section of the Longteng Channel, the West Channel of Nansan Island, the Dongshi Channel, the Channel of Dongtou Mountain, and the Maxie Channel. The first five channels are 300,000-ton-class channels, while the channels from the Maxie Channel to the north are 70,000-ton-class, and channels navigation rules are set for each channel. At present, there are about 113 productive berths and 18 main anchors in the port. Due to the complex environment and a large number of ships, the analysis difficulty of marine traffic characteristics increased, and the requirement of navigation management is enhanced simultaneously.

4.1.2. Data Foundation

According to the definition of ship trajectory in Section 2.1, the selected AIS data (March 2017) contains MMSI, Timestamp, Longitude, Latitude, SOG, and COG. The original data includes 4,215,444 AIS points.

For a reasonable separation of ship trajectories, the broadcast time interval of AIS is counted to obtain a time-division interval threshold. Since the time interval of AIS data sent by docked ships is generally 180 s and the trajectory should be continuous when the ship has a short docking, the time interval of the adjacent data from the same ship is set as greater than 500 s and less than 3600 s for statistics shown in Figure 7. The number gradually decreases as the time interval increases, and the values are mainly distributed in multiples of 180. On this basis, the time interval of 1080 s (less than 0.05) is determined in this paper, and the adjacent data of the same ship are divided into different trajectories when the time interval between them exceeds 1080 s. At the same time, trajectories less than 10 points are not considered. After pretreatment, 4610 ship trajectories are obtained, including 2882 moving trajectories and 810 docking trajectories. The rest are less than 10 points. In these moving trajectories, there are 824 vessels, which are the analysis objects in this work. The distribution of moving trajectories is shown in Figure 8. It can be seen that departures and destinations of trajectories are mainly distributed in the channel and berthing area while there are a lot of chaotic motion trajectories.

4.2. Evaluation of Clustering Results

The clustering results need to be quantified, and the clustering parameters need to be verified through the evaluation method. Since results based on K-Means clustering are significantly different and the process of parameter selection is relatively simple, we mainly consider the evaluation of DBSCAN clustering. The evaluation method proposed by Lee [27] is taken as shown in Formula (17), including abnormal punishment.

\begin{array}{l} E_{s_e} = T o t a l S S E & + N o i s e P e n a l t y \\ = \sum_{i = 1}^{n u m_{c l a s s}} (\frac{1}{2 ⎡ C_{i} ⎤} \sum_{x \in C_{i}} \sum_{y \in C_{i}} d i s t {(x, y)}^{2}) \\ + \frac{1}{2 ⎡ N ⎤} \sum_{w \in N} \sum_{z \in N} d i s t {(w, z)}^{2} \end{array}

(17)

where

C

and

N

represent normal categories and abnormal results, and

d i s t (x, y)

represents the distance between trajectory

x

and trajectory

y

.

4.3. Hybrid Clustering

Figure 8 shows the distribution of the original trajectories, where the red ‘*’ represents the departure while the blue ‘o’ represents the destination (the markers in the following figures are the same). Area 1 indicates the location of the inner section of the Longteng Channel, which is the entrance of the Zhanjiang port; Area 2 and Area 3 are docks where there are many departure and destination points; Area 4 is the Maxie Channel, and all shipments will go into the same channel so that data is intercepted here. As can be seen from Figure 8, after entering the inner section of Longteng Channel, trajectories are separated into two types: the smaller portion reach Area 4 from Area 1 through the upper channel while the most trajectories pass or reach from Area 1 to Area 2 or Area 3, and Area 4 through the lower channel. The incoming and outgoing trajectories are chaotic inside the port, and the departure and destination points are also scattered.

4.3.1. K-Means Clustering

The experiment based on various dissimilarities through the K-Means algorithm is first performed, but the clustering result is not satisfied. Meanwhile, it takes a long time for each update to calculate multiple dissimilarity. Therefore, the K-Means algorithm based on the feature of departure and destination is adopted to implement the initial clustering. According to the four areas identified in Figure 8 and the parameter setting rules of the K-Means algorithm in Appendix A, the initial K value is set to

A_{4}^{2}

, 12, and the maximum number of iterations is set to 500 to avoid taking too long.

Trajectories are divided into 11 categories (ignoring the trajectories outside port) after K-Means clustering. Two categories with a few trajectories have almost the same departure and destination; thus, they are merged into one manually. The last clustering results are plotted in Figure 9, Figure 10 and Figure 11 with the departures, destinations, and numbers of trajectories marked under each subgraph.

Figure 9 depicts three types of trajectories with departures primarily near Area 1 and destinations primarily near Areas 1, 2, and 4, whereas Figure 10 depicts trajectories with departures in Area 4 and destinations distributed across Areas 1, 2, 3, and 4, respectively. Figure 11 shows trajectories of other categories, and the track number is relatively smaller. There are a lot of unreasonable trajectories. For example, some trajectories in Figure 10c started from Area 4 but stopped in the middle. There may be two reasons for this. Firstly, the time-division interval threshold is too small, and continuous trajectories are cut off, but, according to Figure 7, this should not occur. The other reason is that the AIS platforms have been closed, and this abnormal situation will be further discussed in Section 4.4.3. In addition, it can be deduced from Figure 9, Figure 10 and Figure 11 that K-Means clustering based on the dissimilarity of departure and destination can initially separate trajectories, although there are a lot of mixed tracks.

4.3.2. DBSCAN Clustering

Because we aim to analyze the traffic characteristics of incoming and outgoing ships in port, the trajectories in Figure 9c and Figure 10a are selected for DBSCAN clustering. Firstly, Figure 10a is chosen as the research object of sub-trajectory clustering.

In DBSCAN clustering, we need to determine the dissimilarity and weights for the comprehensive dissimilarity and parameters for the DBSCAN algorithm. Firstly, the applicability of each characteristic dissimilarity is analyzed. It can be seen that there are still trajectories with different departures and destinations in Figure 10a. Simultaneously, there are trajectories on different routes with the same departure and destination, indicating the length dissimilarity is also effective. In addition, the spatial distance between the upper right and lower left trajectories is apparent. However, the difference in motion parameter variation feature cannot be seen apparently; thus it needs to be confirmed by statistical analysis methods. The filter variance method in machine learning is adopted to determine the applicability of dynamic parameter variation feature dissimilarity. The variance statistic of speed, course, and motion parameter variation feature is carried out as shown in Figure 12. Before calculating the variance, each value needs to be normalized.

It can be seen from Figure 12 that three features with higher variance are

C O G_{r a n g e}

,

S O G_{s t d}

, and

C O G_{s t d}

. The variance threshold is set to 0.04, and thus,

C O G_{r a n g e}

and

S O G_{s t d}

are selected for dynamic characteristic dissimilarity. Therefore, the dissimilarity for the comprehensive dissimilarity based on DBSCAN clustering is determined including

D_{s e}

,

D_{l}

,

D_{h}

,

D_{S O G_s t d}

, and

D_{C O G_r a n g e}

.

After determining the dissimilarity, we will determine the weight of each dissimilarity. The weights are determined by the combination of analyses. Since the traffic characteristics of outgoing and incoming ships in different routes would be analyzed, the main differences between trajectories in Figure 10a are length dissimilarity and spatial dissimilarity. At the same time, the variance of

D_{C O G_r a n g e}

is higher than that of

D_{S O G_s t d}

according to Figure 12. Therefore, the initial weights of

D_{s e}

,

D_{l}

,

D_{C O G_r a n g e}

,

D_{S O G_s t d}

, and

D_{h}

are assigned to 0.15/0.2/0.2/0.15/0.3.

For parameters in the DBSCAN algorithm, according to the calculation method for k in Appendix B, the value is set from 3 to 8 to obtain the distance curve, as shown in Figure 13. Based on the clustering evaluation Formula (19), the results of the cluster evaluation are obtained, as seen in Table 2. When the

M i n L n s

is set to 6, the evaluation is smallest; when it is set to 3 and 4, the trajectories are separated into 8 and 6 categories, respectively; when it is further increased to 8, there are four categories and different kinds of trajectories that are merged so that the clustering effect is insufficient. Therefore,

M i n L n s

and

ε

is set to 6 and 0.085. The trajectories in Figure 10a are performed for secondary clustering, and the results are shown in Figure 14. The last one indicates abnormal trajectories (same in the following section).

It can be seen that trajectories in Figure 14a,c are on the same channel while trajectories in Figure 14b,d are on another channel. There are obvious differences in the length and destination of trajectories in Figure 14b,d. However, the differences between the trajectories of Figure 14a,c are not apparent. In order to show the difference, after randomly selecting 21 trajectories (number of trajectories in Figure 14c) from Figure 14a, the distributions of length,

C O G_{r a n g e}

and

S O G_{s t d},

of two types of trajectories are plotted as shown in Figure 15. We can know from Figure 15 that the most obvious difference is

C O G_{r a n g e},

although the features

S O G_{s t d}

and trajectory length can also be utilized to divide the trajectories.

The same process would be adopted on the trajectories clustering in Figure 9c. The dissimilarity includes

D_{s e}

,

D_{l}

,

D_{C O G_r a n g e}

,

D_{S O G_s t d}

, and

D_{h}

with weights set as 0.3/0.2/0.15/0.10/0.25. Additionally,

M i n L n s

and

ε

are determined to be 5 and 0.085. The clustering results are shown in Figure 16.

4.4. Traffic Feature Analysis

Based on the clustering and dividing results of the trajectories in the Zhanjiang Port, the traffic characteristics on different routes entering and leaving port are analyzed. We mainly focus on the features of speed, steering, and abnormal trajectories.

4.4.1. Analysis of Speed Feature

(1): Speed Analysis of outgoing and incoming ships

The trajectories of entering and leaving ships are analyzed using the trajectories on the prescribed channel, so the trajectories in Figure 16a are chosen as analysis objects for incoming ships, and the trajectories in Figure 14a are chosen for outgoing ships. The statistics of speed are shown in Figure 17. The average speed of leaving ships is 7.67 knots, while that of entering ships is 7.69 knots, and the speed distribution is almost the same.

In order to show the speed distribution in different areas, the region has been meshed, and the average speed in every grid has been calculated. Figure 18 shows the spatial speed distribution of outgoing and incoming ships. It can be seen that the speed in the main channel does not vary obviously, and ships do not slow down at the turning area. In particular, the circle in Figure 18a refers to the speed distribution of ships that violate the navigation rules. Since there may be a collision risk between outgoing and incoming ships, it is forbidden for ships to turn in advance.

(2): Speed analysis on different routes

The trajectories in Figure 16a,f, recorded as Route 1 and Route 2, respectively, are selected to analyze the speed features on different routes. Figure 19 represents the histogram statistics for trajectories on the two routes, and the average speed of ships on Route 1 is 7.68 knots, while on Route 2, it is 7.2 knots. Therefore, Route 2 is slightly lower, and the speed distribution is more concentrated than Route 1. Furthermore, the average sailing times on the two routes are calculated to be 5559 s and 4853 s, which indicates it would save about 12 min sailing on Route 2 because the voyage is shorter.

4.4.2. Analysis of Course and Turning Feature

Figure 20 depicts the histogram of the course statistics and the course difference of adjacent data, as well as the turning features of incoming and outgoing ships near the red circle in Figure 18a. It can be seen that the course of departing ships is between 50 and 150 degrees while that of entering ships is between 250 to 330 degrees. Furthermore, the course variance is mainly below 15°. However, the average variance of outgoing ships is 4.48 degrees, while that of incoming ships is 3.80 degrees, which indicates the average variance of outgoing ships is higher than that of incoming ships.

4.4.3. Analysis of Abnormal Trajectories

There are two types of abnormal trajectories: the first kind of abnormity is the trajectories less than 10 AIS points, and the other is the abnormal trajectories obtained by DBSCAN clustering.

(1): Abnormal trajectories with less than 10 points

In the pretreatment, trajectory ships with less than 10 AIS points are obtained. Since the docked trajectories have been filtered, these ships may still be sailing. As shown in Figure 21, part of them can still be found in the middle of the channel. For example, the ship trajectory enlarged in the figure (MMSI: 412461138) has only 5 points. The ship’s data were not received within the next 5 h, while data from other ships were accepted within the period. Therefore, it can be inferred that the AIS platform of the ship was closed. The management department should further improve the supervision of these kinds of ships to ensure the safety of navigation.

(2): Abnormal trajectories obtained by DBSCAN clustering

The DBSCAN clustering identifies abnormal data. In order to detect abnormal trajectories, the value

ε

is increased to ensure that the final classification results only include normal and abnormal trajectories. Figure 22 shows examples of abnormal detection in Figure 10a,c. The data in Figure 22a is sent in a discontinuous fashion. It is found that the average time interval is about 10s, but the data will be sent again until 80s later when the speed is changed. The standard deviation of the shipping speed is too large, so it is identified as an abnormal trajectory. In Figure 22b, the ship has circled twice during the navigation, resulting in significant differences in the course and spatial distance. The above two kinds of ships have different abnormal conditions in the port and should be the targets under special monitoring.

5. Discussion

This paper proposes a hybrid clustering method based on K-Means and DBSCAN to classify ship trajectories and analyze navigation characteristics using AIS data. First, based on different ship trajectory characteristics, ship trajectory dissimilarity metrics and quantitative methods are proposed, including static characteristics, dynamic characteristics, and motion parameter variation characteristics. Then, the hybrid clustering model is used to realize the division of ship trajectory, improving the efficiency of ship trajectory recognition. Finally, based on the results of ship traffic clustering, analysis, and anomalies, the recognition of ship behaviors can be identified and classified along various routes/voyages in the port area.

(1): Using the proposed model to classify the ship trajectories is sufficient in port.

In Section 3.1, the static dissimilarity of ship trajectories and the dynamic dissimilarity of ship trajectories are considered. Furthermore, based on the mentioned characteristic dissimilarity of ship trajectory and spatial dissimilarity of ship trajectory, the comprehensive dissimilarity of ship trajectory is presented. According to the characteristic dissimilarity of ship trajectory, spatial dissimilarity of ship trajectory, K-Means, and DBSCAN are applied to classify the ship trajectories, and a case study was carried out in Zhanjiang Port. The results show that the proposed model to classify the ship trajectories is sufficient in port.

(2): Limits of dissimilarity of ship trajectories are importation to classify ship trajectories.

In the process of K-Means clustering, the positions of departure and destination of trajectories are chosen as the foundation, which is not considered in related research. Generally, there are two main reasons: the first is that the scenario is relatively simple, and the features of speed and course are enough to achieve trajectory clustering [28]. In the other situation, the trajectory is cut into multiple parts leading to a focus on the features of each segment [27,29]. However, the features of departure and destination of the ship trajectory are still beneficial to trajectory clustering in both cases. We also test various features that are not helpful in improving the clustering effect in K-Means clustering but will increase the time consumption.

On the other hand, we do not use all the characteristics constructed in Section 3.1 in the DBSCAN clustering. Instead, statistics and analysis are adopted for different types of features. In other research on trajectory clustering based on the DBSCAN algorithm [25,27], the researchers designed the similarity of trajectory for their models, so there is no need to recombine features, but it may not be applicable in other situations. This paper focuses on the complete ship trajectory from design features and their combination to calculate dissimilarity. Compared with similar ways of processing trajectory [10], this paper contains more complete features and a more complex selection process but pays more attention to analyzing real trajectories and combines them with the constructed trajectory characteristics. However, there is no further testing to indicate the current similarity combination is the most effective way, which will be the direction of our next research.

(3): The proposed model is suitable for marine traffic feature analysis.

Using the proposed model, we determine the limits of dissimilarity of ship trajectories based on real AIS data. The ship trajectories are classified into various clusters, which can be used to analyze marine traffic features for traffic management in port [38]. We anticipate that the proposed model provides support for the ship route plan and determination of scheduling schemes for ships in busy waterways and has the potential to become one of the key enablers for future autonomous ships traffic operations in the port area.

6. Conclusions

This paper proposes a hybrid-clustering model based on the K-Means and DBSCAN algorithms for maritime traffic patterns analysis in port. The experimental results show that the model has good applicability. The model can improve the efficiency and effectiveness of ship trajectory clustering for traffic features analysis by stepping up the clustering process and combining the characteristics of the ship trajectory with the advantages of the different clustering algorithms. Additionally, when establishing a comprehensive dissimilarity evaluation model to distinguish trajectories, the motion variation features are more effective than the central trend features in the dynamic characteristics of ship trajectory and the static characteristic dissimilarity and the spatial dissimilarity between trajectories. Besides, the parameters of DBSCAN for different subclass trajectories show differences, indicating that the clustering of different subclass trajectories needs to be considered separately to ensure optimal partitioning. Finally, abnormal ship behaviors in the port area have a variety of patterns, including advanced steering, abnormal AIS platform operation, and circled navigation track. In contrast, the speed and course analysis for outgoing and incoming vessels show reasonable results and can bring benefits for port management. This paper speculates that the model is still adaptable in other maritime environments, although only the port is taken as the experimental object in this paper.

We anticipate that the proposed model supports for the ship route plan and the determination of scheduling schemes for ships in busy waterways and helps port authorities with berth allocation and ship emission inventory for ports [39,40]. It also has the potential to become one of the key enablers for future autonomous ships’ traffic operations in the port area. In future research, we will further test the effectiveness of the combination of ship trajectory characteristics and their applicability in different scenarios. On the other hand, the process of determining parameters in the DBSCAN algorithm is relatively complex, so we will try to simplify it to increase the usability of the hybrid-clustering model.

Author Contributions

Conceptualization, L.L. and Y.W.; methodology, L.L.; software, L.L.; validation, L.L. and Y.H.; formal analysis, L.L. and Y.W.; investigation, Y.H.; resources, Y.W.; data curation, L.L. and Y.H.; writing—original draft preparation, L.L.; writing—review and editing, J.S.; visualization, X.D.; supervision, Y.Z.; project administration, Y.Z.; funding acquisition, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

The research was funded by the Transportation Science and Technology Demonstration Project of Jiangsu Province (grant number: 2018Y02); the China Freight System Efficient Green Development System Construction Project (grant number: P159883); Southeast University-Nanjing Kirin Science and Technology Innovation Department 2021 Special Funds (8521008862).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Algorithm A1. K-Means Algorithm for trajectories cluster.

Input:

Dataset D = {x_{1}, \dots, x_{m}}

, clustering number K, the maximum number of iterations N

Output:

Clustering division C = {c_{1}, \dots, c_{k}}

Process:

1.

Select K trajectories as the center trajectories {u_{1}, \dots, u_{k}}

;

2.

Initially cluster division C_{1} = {c_{1}, \dots, c_{k}}

;

3.

For n = 1, 2, \dots, N

:

4.

For i = 1, 2, \dots, m

:

5.

Calculate distance d_{i, j}

between trajectory x_{i} and u_{j} (j = 1, \dots, K)

6.

Mark category as j corresponding the smallest d_{i, j}

;

7. End for

8.

For j = 1, \dots, K

:

9. Calculate the center trajectories based on new clustering result

u^{j} = \frac{1}{| u^{j} |} \sum_{x \in u^{j}} x

10. End for

11. If the clustering result remains consistent:

12. Go to line 17;

13. Else:

14. Go to line 4;

15. End if

16. End for

17.

Output C = {c_{1}, \dots, c_{k}}

.

(1): The number of clusters K

The basic idea of the K-means algorithm is to select

n

centers in the dataset and classify the objects closest to them. Since the positions of departure and destination of the ship trajectories in a specific area are basically fixed, the departure and destination of the ship trajectories can be used as the characteristics of data for K-means clustering. We only need to know the total number

n

of departures and destinations (overlap counts as one region) and assume the existence of trajectories between every two regions. In this way, the final clustering results must contain the subclass between the two regions that actually exist. Even if the number of categories set is more than the actual categories, there will be empty categories in the final clustering result. According to the calculation formula of permutation and combination, there are

A_{n}^{2}

ways to select two regions as the departure and destination of the track, respectively, form n regions.

(2): The selection of the initial clustering center

Since the features of departure and destination are used to complete clustering based on the K-Means algorithm, the following rules are set to determine the center

{u_{1}, \dots, u_{k}}

: (a) a trajectory

u_{1}

is randomly selected as the center trajectory of the first category and the departure and destination of the initial trajectory are the departure and destination features of the initial trajectory, respectively; (b) the trajectory with the largest distance from the first trajectory

u_{1}

is selected as the central trajectory

u_{2}

of the second type of trajectory; (c) the trajectory

u_{3}

with the largest sum distance from the first trajectory

u_{1}

and the second trajectory

u_{2}

is selected as the central trajectory of the third type of trajectory

j

. Successively, the initial clustering center is obtained.

Appendix B

Algorithm A2. DBSCAN Algorithm for trajectories cluster.

Input:

Dataset D = {y_{1}, \dots, y_{m}}

Output:

Clustering division E = {e_{1}, \dots, e_{k}}

Process:

1. Mark the D as unprocessed trajectories;

2.

For = 1, 2, \dots, m

:

3.

If y_{i}

is visited:

4. Continue;

5. Else

6.

Mark y_{i}

as visited

7.

Check the neighborhood N E p s (y_{i})

;

8.

If the number of objects in N E p s (y_{i}) \geq M i n L n s

:

9.

Mark y_{i} as core point and set up a new class e and add objects in N E p s (y_{i}) to M

;

10.

For p in M

:

11.

If p

is visited

12. Continue;

13. Else

14.

Mark p

as visited

15.

Check the neighborhood N E p s (p)

;

16.

If the number of objects in N E p s (p) \geq M i n L n s

;

17.

Add objects not be classified in N E p s (p) to N and add p to e

;

18. Else:

19.

Add p to e

;

20. End if

21. End for

22. End if

23.

If the number of objects in N E p s (y_{i}) < M i n L n s

:

24.

Mark y_{i}

as noise point;

25. End if

26. End for

27.

Output E = {e_{1}, \dots, e_{k}}

.

As for the selection of

ε

and

M i n L n s,

the k-distance features the nearest neighbors. When k is not bigger than the size of the class, the corresponding distance will be small; for the trajectories not in the class, the distance will be relatively large, and the k-distance is relatively large. Therefore, for a certain value k, calculate the k-distance of all the trajectories and sort them incrementally in turn, then draw the curve of the sorted values, the k distance changes dramatically, and the corresponding position is a suitable value

ε

.

References

Svanberg, M.; Santén, V.; Hörteborn, A.; Holm, H.; Finnsgård, C. AIS in maritime research. Mar. Policy 2019, 106, 103520. [Google Scholar] [CrossRef]
Zhang, M.; Zhang, D.; Goerlandt, F.; Yan, X.; Kujala, P. Use of HFACS and fault tree model for collision risk factors analysis of icebreaker assistance in ice-covered waters. Saf. Sci. 2019, 111, 128–143. [Google Scholar] [CrossRef]
Heij, C.; Bijwaard, G.; Knapp, S. Ship inspection strategies: Effects on maritime safety and environmental protection. Transp. Res. Part D Transp. Environ. 2011, 16, 42–48. [Google Scholar] [CrossRef] [Green Version]
Zhang, L.; Meng, Q.; Xiao, Z.; Fu, X. A novel ship trajectory reconstruction approach using AIS data. Ocean Eng. 2018, 159, 165–174. [Google Scholar] [CrossRef]
Zhang, M.; Zhang, D.; Fu, S.; Kujala, P.; Hirdaris, S. A predictive analytics method for maritime traffic flow complexity estimation in inland waterways. Reliab. Eng. Syst. Saf. 2022, 220, 108317. [Google Scholar] [CrossRef]
Szlapczynski, R.; Szlapczynska, J. An analysis of domain-based ship collision risk parameters. Ocean Eng. 2016, 126, 47–56. [Google Scholar] [CrossRef]
Yang, D.; Wu, L.; Wang, S.; Jia, H.; Li, K.X. How big data enriches maritime research—A critical review of Automatic Identification System (AIS) data applications. Transp. Rev. 2019, 39, 755–773. [Google Scholar] [CrossRef]
Harati-Mokhtari, A.; Wall, A.; Brooks, P.; Wang, J. Automatic Identification System (AIS): Data Reliability and Human Error Implications. J. Navig. 2007, 60, 373–389. [Google Scholar] [CrossRef]
Kao, S.-L.; Chang, K.-Y. Study on fuzzy GIS for navigation safety of fishing boats. J. Mar. Eng. Technol. 2017, 16, 84–93. [Google Scholar] [CrossRef] [Green Version]
Zhen, R.; Jin, Y.; Hu, Q.; Shao, Z.; Nikitakos, N. Maritime Anomaly Detection within Coastal Waters Based on Vessel Trajectory Clustering and Naïve Bayes Classifier. J. Navig. 2017, 70, 648–670. [Google Scholar] [CrossRef]
Zhang, S.-K.; Shi, G.-Y.; Liu, Z.-J.; Zhao, Z.-W.; Wu, Z.-L. Data-driven based automatic maritime routing from massive AIS trajectories in the face of disparity. Ocean Eng. 2018, 155, 240–250. [Google Scholar] [CrossRef]
Hansen, M.G.; Jensen, T.K.; Lehn-Schiøler, T.; Melchild, K.; Rasmussen, F.M.; Ennemark, F. Empirical Ship Domain based on AIS Data. J. Navig. 2013, 66, 931–940. [Google Scholar] [CrossRef] [Green Version]
Jinyu, L.; Lei, L.; Xiumin, C.; Wei, H.; Xinglong, L.; Cong, L. Automatic identification system data-driven model for analysis of ship domain near bridge-waters. J. Navig. 2021, 74, 1284–1304. [Google Scholar] [CrossRef]
Winther, M.; Christensen, J.H.; Plejdrup, M.S.; Ravn, E.S.; Eriksson, Ó.F.; Kristensen, H.O. Emission inventories for ships in the arctic based on satellite sampled AIS data. Atmos. Environ. 2014, 91, 1–14. [Google Scholar] [CrossRef]
Adland, R.; Jia, H.; Strandenes, S.P. Are AIS-based trade volume estimates reliable? The case of crude oil exports. Marit. Policy Manag. 2017, 44, 657–665. [Google Scholar] [CrossRef] [Green Version]
Chen, L.; Zhang, D.; Ma, X.; Wang, L.; Li, S.; Wu, Z.; Pan, G. Container Port. Performance Measurement and Comparison Leveraging Ship GPS Traces and Maritime Open Data. IEEE Trans. Intell. Transp. Syst. 2016, 17, 1227–1242. [Google Scholar] [CrossRef]
Zhou, Y.; Daamen, W.; Vellinga, T.; Hoogendoorn, S.P. Ship classification based on ship behavior clustering from AIS data. Ocean Eng. 2019, 175, 176–187. [Google Scholar] [CrossRef]
Shahir, A.Y.; Tayebi, M.A.; Glässer, U.; Charalampous, T.; Zohrevand, Z.; Wehn, H. Mining vessel trajectories for illegal fishing detection. In Proceedings of the 2019 IEEE International Conference on Big Data, Los Angeles, CA, USA, 9–12 December 2019; pp. 1917–1927. [Google Scholar]
Shahir, A.Y.; Charalampous, T.; Tayebi, M.A.; Glasser, U.; Wehn, H. TripTracker: Unsupervised Learning of Fishing Vessel Routine Activity Patterns. In Proceedings of the 2021 IEEE International Conference on Big Data, Online, 15–18 December 2021; pp. 1928–1939. [Google Scholar]
Bai, X.; Cheng, L.; Iris, Ç. Data-driven financial and operational risk management: Empirical evidence from the global tramp shipping industry. Transp. Res. Part E Logist. Transp. Rev. 2022, 158, 102617. [Google Scholar] [CrossRef]
Duan, H.; Ma, F.; Miao, L.; Zhang, C. A semi-supervised deep learning approach for vessel trajectory classification based on AIS data. Ocean Coast. Manag. 2022, 218, 106015. [Google Scholar] [CrossRef]
Shahir, H.Y.; Glasser, U.; Shahir, A.Y.; Wehn, H. Maritime situation analysis framework: Vessel interaction classification and anomaly detection. In Proceedings of the 2015 IEEE International Conference on Big Data, Santa Clara, CA, USA, 29 October–1 November 2015; pp. 1279–1289. [Google Scholar]
Natale, F.; Carvalho, N.; Paulrud, A. Defining small-scale fisheries in the EU on the basis of their operational range of activity The Swedish fleet as a case study. Fish. Res. 2015, 164, 286–292. [Google Scholar] [CrossRef]
Chen, J.; Lu, F.; Peng, G. A quantitative approach for delineating principal fairways of ship passages through a strait. Ocean Eng. 2015, 103, 188–197. [Google Scholar] [CrossRef]
Zhao, L.; Shi, G.; Yang, J. An Adaptive Hierarchical Clustering Method for Ship Trajectory Data Based on DBSCAN Algorithm. In Proceedings of the 2017 IEEE 2nd International Conference on Big Data Analysis, Beijing, China, 10–12 March 2017; pp. 329–336. [Google Scholar]
Wang, L.; Chen, P.; Chen, L.; Mou, J. Ship AIS Trajectory Clustering: An HDBSCAN-Based Approach. J. Mar. Sci. Eng. 2021, 9, 566. [Google Scholar] [CrossRef]
Lee, J.-G.; Han, J.; Whang, K.-Y. Trajectory clustering: A partition-and-group framework. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, Beijing, China, 11–14 June 2007; pp. 593–604. [Google Scholar]
Li, H.; Liu, J.; Liu, R.W.; Xiong, N.; Wu, K.; Kim, T.-H. A Dimensionality Reduction-Based Multi-Step Clustering Method for Robust Vessel Trajectory Analysis. Sensors 2017, 17, 1792. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Sheng, K.; Liu, Z.; Zhou, D.; He, A.; Feng, C. Research on Ship Classification Based on Trajectory Features. J. Navig. 2017, 71, 100–116. [Google Scholar] [CrossRef]
Kowalska, K.; Peel, L. Maritime Anomaly Detection using Gaussian Process; Active Learning. In Proceedings of the 15th International Conference on Information Fusion, Singapore, 9–12 July 2012; pp. 1164–1171. [Google Scholar]
Laxhammar, R. Anamaly detection for sea surveilance. In Proceedings of the 11th International Conference on Information Fusion, Cologne, Germany, 30 June–3 July 2008; pp. 1–8. [Google Scholar]
Fu, P.; Wang, H.; Liu, K.; Hu, X.; Zhang, H. Finding Abnormal Vessel Trajectories Using Feature Learning. IEEE Access 2017, 5, 7898–7909. [Google Scholar] [CrossRef]
Zhang, M.; Montewka, J.; Manderbacka, T.; Kujala, P.; Hirdaris, S. A Big Data Analytics Method for the Evaluation of Ship—Ship Collision Risk reflecting Hydrometeorological Conditions. Reliab. Eng. Syst. Saf. 2021, 213, 107674. [Google Scholar] [CrossRef]
Zhang, M.; Conti, F.; Le Sourne, H.; Vassalos, D.; Kujala, P.; Lindroth, D.; Hirdaris, S. A method for the direct assessment of ship collision damage and flooding risk in real conditions. Ocean Eng. 2021, 237, 109605. [Google Scholar] [CrossRef]
Lei, L.; Yong, Z.; Ming-Yang, Z.; Yong-Ming, W.; Jing, C. Analysis and optimization of ship trajectory dissimilarity models based on multi-feature fusion. J. Traffic Transp. Eng. 2021, 21, 199–213. [Google Scholar]
Junmin, M.; Pengfei, C.; Yixiong, H.; Xingjian, Z.; Jianfeng, Z.; Rong, H. Fast self-tuning spectral clustering algorithm for AIS ship trajectory. J. Harbin Eng. Univ. 2018, 39, 428–432. [Google Scholar]
Liu, L.; Chu, X.; Jiang, Z.; Liu, X.; Li, J.; He, W. Coverage effectiveness analysis of AIS base station: A case study in Yangtze River. In Proceedings of the IEEE 4th International Conference on Transportation Information and Safety (ICTIS), Banff, AB, Canada, 8–10 August 2017; pp. 178–183. [Google Scholar]
Zhang, M.; Zhang, D.; Yao, H.; Zhang, K. A probabilistic model of human error assessment for autonomous cargo ships focusing on human–autonomy collaboration. Saf. Sci. 2020, 130, 104838. [Google Scholar] [CrossRef]
Venturini, G.; Iris, Ç.; Kontovas, C.A.; Larsen, A. The multi-port berth allocation problem with speed optimization and emission considerations. Transp. Res. Part D Transp. Environ. 2017, 54, 142–159. [Google Scholar] [CrossRef] [Green Version]
Toscano, D.; Murena, F.; Quaranta, F.; Mocerino, L. Assessment of the impact of ship emissions on air quality based on a complete annual emission inventory using AIS data for the port of Naples. Ocean Eng. 2021, 232, 109166. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of ship trajectory.

Figure 2. The flowchart for ship trajectory clustering and anomaly detection based on hybrid-clustering model.

Figure 3. Example of Hausdorff distance calculation.

Figure 4. The schematic diagram based on hybrid clustering. The letters A–D represent the destination and departure regions of trajectories whereas the numbers 1–8 represent the reference number of trajectories.

Figure 5. The flowchart of the proposed procedure.

Figure 6. The research area in Zhanjiang Port.

Figure 7. Time interval statistics of adjacent data of the same ship.

Figure 8. The original moving trajectories in Zhanjiang Port (The red ‘*’s represent departures, the blue ‘o’s represent destinations).

Figure 9. Trajectories with departures mainly near Area 4. (a) Area 1 to Area 1, 609; (b) Area 1 to Area 2, 256; (c) Area 1 to Area 4, 388.

Figure 10. Trajectories with departures mainly near Area 4. (a) Area 4 to Area 1, 267; (b) Area 4 to Area 2, 80; (c) Area 4 to Area 3, 397; (d) Area 4 to Area 4, 586.

Figure 11. Trajectories with departures inside the port. (a) Area 2 to Area 1, 156; (b) Area 3 to Area 3, 71; (c) Area 2 to Area 4, 48; (d) Other trajectories in port, 24.

Figure 12. Distribution of motion parameter variance.

Figure 13. The k Distance curve of integrated distance.

Figure 14. DBSCAN Clustering results of Figure 10a with

M i n L n s = 6

and

ε = 0.085

. (a) Class A, 141; (b) Class B, 30; (c) Class C, 21; (d) Class D, 26 (e) abnormal trajectories, 49.

Figure 14. DBSCAN Clustering results of Figure 10a with

M i n L n s = 6

and

ε = 0.085

. (a) Class A, 141; (b) Class B, 30; (c) Class C, 21; (d) Class D, 26 (e) abnormal trajectories, 49.

Figure 15. Trajectory feature comparison between Figure 14a,c. (a) feature comparison considering

C O G_{r a n g e}

; (b) feature comparison considering trajectory length; (c) feature comparison considering

S O G_{s t d}

.

Figure 15. Trajectory feature comparison between Figure 14a,c. (a) feature comparison considering

C O G_{r a n g e}

; (b) feature comparison considering trajectory length; (c) feature comparison considering

S O G_{s t d}

.

Figure 16. DBSCAN clustering results of Figure 9c with

M i n L n s = 5

and

ε = 0.085

. (a) Class A; (b) Class B, 18; (c) Class C, 22; (d) Class D,13; (e) Class E, 36; (f) Class F, 98; (g) Class G, 5; (h) abnormal trajectories, 30.

Figure 16. DBSCAN clustering results of Figure 9c with

M i n L n s = 5

and

ε = 0.085

. (a) Class A; (b) Class B, 18; (c) Class C, 22; (d) Class D,13; (e) Class E, 36; (f) Class F, 98; (g) Class G, 5; (h) abnormal trajectories, 30.

Figure 17. Frequency histogram of outgoing and incoming ship speed. (a) Frequency histogram of speed of outgoing ships; (b) frequency histogram of speed of incoming ships.

Figure 18. Speed spatial distribution of outgoing and incoming ships. (a) Speed spatial distribution of outgoing ships; (b) speed spatial distribution of incoming ships.

Figure 19. Frequency histogram of incoming route 1 and incoming route 2 in Figure 16. (a) Frequency histogram of speed in Figure 16a; (b) frequency histogram of speed in Figure 16f.

Figure 20. Turning angle statistics of outgoing and incoming ships. (a) Histogram of navigation course statistics; (b) histogram of course difference of adjacent data.

Figure 21. Example of trajectories less than 10 points.

Figure 22. Example of abnormal trajectories of outgoing and incoming ships. (a) Abnormal trajectory in Figure 10a; (b) abnormal trajectory in Figure 9c.

Table 1. Description of parameters for dimensional data.

Data Column	Description
MMSI	Maritime Mobile Service Identity
TIMESTAMP	The timestamp of AIS data
LON	The longitude of the position
LAT	The latitude of the position
SOG	The shipping speed over ground
COG	The shipping course over ground

Table 2. Clustering evaluation result with different

M i n L n s

and

ε

.

Table 2. Clustering evaluation result with different

M i n L n s

and

ε

.

$M i n L n s$ / $ε$	3/0.072	4/0.075	5/0.078	6/0.085	7/0.090	8/0.101
$E_{s_e}$	6.6257	6.1580	5.9966	5.7143	5.7860	5.7166

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, L.; Zhang, Y.; Hu, Y.; Wang, Y.; Sun, J.; Dong, X. A Hybrid-Clustering Model of Ship Trajectories for Maritime Traffic Patterns Analysis in Port Area. J. Mar. Sci. Eng. 2022, 10, 342. https://doi.org/10.3390/jmse10030342

AMA Style

Liu L, Zhang Y, Hu Y, Wang Y, Sun J, Dong X. A Hybrid-Clustering Model of Ship Trajectories for Maritime Traffic Patterns Analysis in Port Area. Journal of Marine Science and Engineering. 2022; 10(3):342. https://doi.org/10.3390/jmse10030342

Chicago/Turabian Style

Liu, Lei, Yong Zhang, Yue Hu, Yongming Wang, Jingyi Sun, and Xiaoxiao Dong. 2022. "A Hybrid-Clustering Model of Ship Trajectories for Maritime Traffic Patterns Analysis in Port Area" Journal of Marine Science and Engineering 10, no. 3: 342. https://doi.org/10.3390/jmse10030342

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Hybrid-Clustering Model of Ship Trajectories for Maritime Traffic Patterns Analysis in Port Area

Abstract

1. Introduction

2. Description of the Problem

2.1. Definition of Ship Trajectory

2.2. Ship Trajectory Clustering Procedures

2.3. Framework of Trajectory Clustering and Anomaly Detection

3. Methodology and Modeling

3.1. Ship Trajectory Characteristics Construction

3.1.1. AIS Data Reconstruction

3.1.2. Static Characteristics of Ship Trajectory

3.1.3. Dynamic Characteristics of Ship Trajectory

3.2. Dissimilarity Evaluation of Ship Trajectory Characteristics

3.2.1. Characteristic Dissimilarity of Ship Trajectory

3.2.2. Spatial Dissimilarity (Distance) of Ship Trajectories

3.2.3. Comprehensive Dissimilarity of Ship Trajectory

3.3. Hybrid Trajectory Clustering Model

4. Case Studies

4.1. Research Area and Data Foundation

4.1.1. Research Area

4.1.2. Data Foundation

4.2. Evaluation of Clustering Results

4.3. Hybrid Clustering

4.3.1. K-Means Clustering

4.3.2. DBSCAN Clustering

4.4. Traffic Feature Analysis

4.4.1. Analysis of Speed Feature

4.4.2. Analysis of Course and Turning Feature

4.4.3. Analysis of Abnormal Trajectories

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI