1.1. Background
With the rapid development of economic globalization and the shipping industry, more than 82.5% of goods are transported by water. In recent years, the continuous progress of human science and technology has further promoted the development of ship science and technology, developing towards large-scale, specialized, high-speed, and unmanned ships. The safety of ships sailing at sea has also gradually attracted attention. The analysis conducted by EMSA during safety investigations [
1] determined that from 2014 to 2020, at accident events or contributing factor levels, 89.5% of maritime safety accidents were related to human action. Therefore, with the development of the ship intelligence industry and to ensure that human factors will not affect the navigation safety of ships, the research and development of collision avoidance technology of intelligent ships/unmanned surface vehicles (USVs) have become a general trend. The International Maritime Organization (IMO) put forward a new work plan on Maritime Autonomous Surface Ships (MASSs) in 2017–2018 and formulated relevant conventions and regulations to solve a series of problems such as safety and environmental protection. MASSs is the term the IMO uses for ships that, to varying degrees, can operate in part or completely independent of human interaction [
2,
3]. MASS research has become a research hotspot in the international maritime field. The research, development, and adoption of MASSs are becoming the development trend in the shipbuilding industry. The shipping industry and relevant scientific research institutions have invested in relevant research on autonomous ships with different degrees of autonomy and different levels of intelligence.
The navigation environment of ships at sea is complex and changeable. There are static obstacles such as shorelines, islands, reefs, and sunken ships and unidentified dynamic obstacles. They are also interfered with by factors such as time-varying and uncertain winds and currents at sea. Accidents related to the nature of ship navigation, such as ship collisions, contacts, and groundings/strandings, occur occasionally. As shown in
Figure 1 and
Figure 2 below, according to the investigation results of maritime accidents of the European Maritime Safety Agency (EMSA) [
1], over the period 2014–2020, accidents of navigational nature (collisions, contacts, and groundings/strandings) represented 43% of all occurrences related to the ship. These three types of accidents accounted for 13%, 17%, and 13% of the total number of accidents respectively. Therefore, the ship collision avoidance maneuver is crucial for the safe navigation of ships at sea. At present, the increasing traffic flow of ships at sea leads to increasingly crowded navigation channels, further increasing the risk of ship collision. Ship collision accidents usually cause casualties, property losses, and even more severe damage to the ecological environment. To ensure the autonomous and safe navigation of an MASS at sea, it is urgent to develop its autonomous intelligent collision avoidance technology under complex navigation conditions.
The development of unmanned ships is still at an early stage, and there will be a long coexistence between manned and unmanned ships in the future. However, the International Regulations for Preventing Collisions at Sea 1972 (COLREG) only aims at collision avoidance between manned ships. Therefore, the collision avoidance problem between manned and unmanned ships is an urgent problem to solve. Currently, unmanned ship/smart ship/MASS refer to the same object, only the name is different. Therefore, intelligent collision avoidance between MASSs and manned ships is an urgent problem.
1.2. The Literature Review
There are few studies on MASSs and manned ships and most focus on collision avoidance decision making between manned ships. According to the classification of research methods, ship collision avoidance models can be divided into traditional and intelligent ones.
The traditional ship collision avoidance model is based on traditional methods. The traditional methods include analytic geometry (AG), velocity obstacle (VO), fuzzy logic (FL), Swarm Intelligence Algorithm (SIA), and a mixture of these algorithms.
The application of the Marine Collision Avoidance System (MCAS) and Automatic Radar Plotting Aid (ARPA) system led to the subsequent proposal of the collision avoidance decision-making model based on AG. Wilson et al. [
4] proposed a collaborative decision-making model of ship collision avoidance based on the line-of-sight method. Larson et al. [
5,
6] applied the Morphin algorithm to draw multiple arcs before the ship, covering the local obstacle map to consider all safe paths. Casalino et al. [
7] proposed a local obstacle avoidance method based on the concept of the Bounding Box, where the Bounding Box is defined as a rectangular area that ships should avoid. Simetti et al. [
8] introduced a safe Bounding Box around the original collision boundary box in the continuous workspace. Szlapczynski et al. [
9] proposed a method for determining, organizing, and displaying collision avoidance information based on the Collision Threat Parameter Area (CTPA), which can improve the handling measures for MASSs in heavy weather conditions. Kim et al. [
10] considered the COLREG. They proposed a collision avoidance algorithm based on the isochron method, including calculating collision risk and planning the optimal path for multi-vessel collision avoidance. Gail et al. [
11] put forward the Collision Avoidance Dynamic Critical Area (CADCA) concept based on the minimum maneuvering area required by the two ships in the encounter. They explained the relevant hydrodynamic effects of ship dynamics and different rudder angles and forward speeds. Subsequently, Gail [
12] improved the CADCA and proposed a collision avoidance model between ships and static obstacles.
The idea of the VO appeared in 1980 when it was named CTPA [
13]. Later, Pederson et al. [
14] applied it to the navigation field and proved that this method could provide better collision avoidance decision support for the officer on watch (OOW) compared with traditional ARPA equipment. Kuwata et al. (2014) [
15] considered the COLREG and designed a ship local path planner based on the VO. Chen et al. [
16] proposed a new risk detection method for ship collisions based on the VO and AIS data. Huang et al. [
17] proposed a ship collision avoidance model based on the VO algorithm for collision avoidance scenarios with nonlinear motion characteristics and predictable target ship trajectories. Subsequently, this model was improved, and a maritime collision avoidance system for ships based on the Generalized VO (GVO) algorithm was proposed [
18]. Li et al. [
19] proposed a ship dynamic path planning method based on a multi-level Morphin adaptive search tree algorithm and VO. This method considers the ship’s maneuverability and the COLREG.
As a nonlinear control method independent of the controlled object, the fuzzy mathematics method can be applied well to the intelligent collision avoidance of ships. Liu et al. [
20] combined the FL method with a neural network to quantify ship collision avoidance decisions. Perera et al. [
21] considered the COLREGS and proposed an intelligent collision avoidance decision-generation system based on FL. Subsequently, Perera et al. [
22] proposed a collision avoidance decision-generation model and execution of collision avoidance behavior based on FL theory and the Bayesian network. Ahn et al. [
23] used the network-based adaptive fuzzy inference system (ANFIS), expert system, and multilayer perceptron (MLP) in the ship collision avoidance system.
SIO algorithms mainly include Ant Colony Optimization (ACO), Particle Swarm Optimization (PSO), and so forth. Tsou et al. [
24] proposed a path-planning system based on the ACO after considering the COLREG and navigation practice. Lazarowska [
25] proposed an ACO-based path planning method for USVs in the dynamic marine environment, which can be applied to the decision support system on board or the intelligent obstacle detection and collision avoidance system.
The intelligent collision avoidance model includes the model established using advanced artificial intelligence (AI) technology to study the decision-making problem of ship collision avoidance. AI technology includes Artificial Neural Networks (ANNs) and Reinforcement Learning (RL).
The ANN has been a research hotspot in AI since the 1980s. It abstracts the human brain neural network from the perspective of information processing, establishes a simple model, and forms different networks according to different connection methods. As early as the 1990s, some scholars successively used this method to establish decision-making models of ship collision avoidance. Simsir et al. [
26] established a ship position prediction model using the ANN to solve the collision avoidance problem of ships in narrow waterways, laying a foundation for subsequent determination of the possibility of collision between two ships. Praczyk et al. [
27] proposed an automatic multi-ship collision avoidance for system based on an evolutionary neural network to solve the collision avoidance task in complex, multi-objective, and rapidly changing environments. Xu et al. [
28] proposed an automatic collision avoidance method based on a deep convolution neural network (CNN) using the solid visual processing ability of machine vision technology. Lin et al. [
29] proposed a recursive neural network (RNN) with a convolution unit to improve the autonomy and intelligence of obstacle avoidance planning for unmanned underwater vehicles. This method uses a convolution layer to replace the entire connection layer in the standard RNN, thus reducing the number of parameters and improving the ability to extract features. Johansen et al. [
30] used a Bayesian belief network from systems theoretical process analysis (STPA) to establish an online risk assessment model for ships. As an advanced method in the field of AI, RL has made remarkable achievements in the field of games and control. RL is suitable for solving decision-making problems with time sequence, which is consistent with the decision-making process of intelligent driving, so it is gradually applied to the intelligent decision making of USVs, unmanned aerial vehicles, and unmanned vehicles. RL can be divided into model-based RL and model-free RL according to whether the environment model is established. At present, model-free RL methods are primarily used in the field of intelligent collision avoidance, mainly including Q-learning, State Action Reward State Action (SARSA), Deep Q-Network (DQN), Policy Gradient (PG), Proximal Policy Optimization (PPO), Deep Deterministic Policy Gradient (DDPG), and Actor Critical (AC) and its derivative algorithms. Yin et al. [
31] proposed a simple obstacle avoidance algorithm based on Q-Networks to deal with complex navigation situations and unknown environmental dynamics. Shen et al. [
32] put forward a multi-level intelligent collision avoidance algorithm for unmanned ships based on the DQN after fully considering the COLREGs, experience in maritime navigation, and ship maneuverability. Zhang et al. [
33] proposed a MASS autonomous navigation decision-making method based on the DQN and artificial potential field. Woo et al. [
34] considered the COLREG and proposed a collision avoidance model for unmanned vessels based on machine vision and the DQN. Wu et al. [
35] proposed an autonomous navigation and intelligent collision avoidance algorithm based on the Dueling DQN (DDQN). Liu et al., [
36] aiming at the problem that RL is time-consuming and infeasible in dealing with complex tasks, combined DQN with Transfer Learning (TL), which can transfer the knowledge learned in simple tasks to closely related but more complex tasks, providing a new idea for ship collision avoidance. Ryohei et al. [
37] proposed an automatic collision avoidance system for ships based on grid sensors and the PPO, then improved this algorithm (Ryohei et al., 2020) [
38], and proposed a multi-ship collision avoidance and waypoint navigation model based on the Long Short-Term Memory networks (LSTM) and the Proximal Policy Optimization (PPO). Zhao et al. [
39] considered the COLREGs and the ship’s maneuverability and used DQN to directly map the ship’s status to the ship’s rudder angle command and used the PG-based AC algorithm to train the multi-ship collision avoidance model. Jiang et al. [
40] considered COLREGs and proposed an autonomous ship collision avoidance method based on deep RL (DRL) and the attention mechanism. The method includes collision risk assessment and motion planning and analyzes the rationality and effectiveness of collision avoidance decisions from the perspective of collision risk and the nearest safe distance. Heiberg et al. [
41] proposed a decision-making model of ship intelligent collision avoidance based on the DRL and the PPO. They applied the most advanced collision risk index to the reward design of the model.
Model-free RL methods can be applied to many fields. Because there is no need to establish a model, all agents’ decisions are obtained through interaction with the environment, so this method applies to scenes that are difficult to model or cannot be modeled at all. Similarly, without models, agents need to interact with and explore the environment constantly, requiring a lot of trial and error, leading to the most significant disadvantage of model-free RL: low data efficiency. This algorithm often needs to interact with the environment hundreds of thousands, millions, or even tens of millions of times. There are few intelligent collision avoidance models based on model-based RL. In order to solve the problem of low efficiency of model-free RL, Xie et al. [
42] proposed a compound learning method based on the asynchronous advantage actor-critic (A3C) algorithm, LSTM, and Q-learning. This method combines the advantages of model-based and model-free RL. The author applies this method to multi-ship intelligent collision avoidance at sea, and the simulation results show that the decision-making performance of this method is superior to the standard A3C model and the traditional optimization model.
Through the analysis and comparison, it can be concluded that the main problems of the decision-making model of ship collision avoidance are: (1) They only consider static obstacles and do not consider the shape of obstacles or simplify their shapes to a circle. (2) To simplify the collision avoidance situation and the generation of collision avoidance decisions, the motion state of the own ship and the target ship is excessively simplified. For example, if the ship’s motion model is simplified or the motion factors are ignored, the collision avoidance measures taken by ships in the close encounter may fail easily. The ship’s dynamic model rarely considers the effects of wind and flow. (3) The interference of environmental factors needs to be fully considered, such as visibility, sea conditions, narrow or open waters, and the impact of traffic flow density at that time. (4) In the collision avoidance problem of MASSs, it is assumed that MASSs, like manned ships, should comply with the COLREGS [
43,
44]. However, MASSs differs significantly from manned ships regarding intelligence, main dimensions, maneuverability, and tasks performed [
45]. (5) For the decision algorithm of ship collision avoidance based on the RL, the biggest problem of model-free RL is the low data efficiency. In contrast, the biggest challenge of the model-based RL method is the model’s error. (6) For the decision-making algorithm of ship collision avoidance based on the ANN, there is a problem with local optimization.