Path Planning for Automatic Berthing Using Ship-Maneuvering Simulation-Based Deep Reinforcement Learning

Vo, Anh Khoa; Mai, Thi Loan; Yoon, Hyeon Kyu

doi:10.3390/app132312731

Open AccessArticle

Path Planning for Automatic Berthing Using Ship-Maneuvering Simulation-Based Deep Reinforcement Learning

by

Anh Khoa Vo

¹

,

Thi Loan Mai

²

and

Hyeon Kyu Yoon

^2,*

¹

Department of Smart Environmental Energy Engineering, Changwon National University, Changwon 51140, Republic of Korea

²

Department of Naval Architecture and Marine Engineering, Changwon National University, Changwon 51140, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(23), 12731; https://doi.org/10.3390/app132312731

Submission received: 30 October 2023 / Revised: 23 November 2023 / Accepted: 25 November 2023 / Published: 27 November 2023

(This article belongs to the Topic Artificial Intelligence in Navigation)

Download

Browse Figures

Versions Notes

Abstract

:

Despite receiving much attention from researchers in the field of naval architecture and marine engineering since the early stages of modern shipbuilding, the berthing phase is still one of the biggest challenges in ship maneuvering due to the potential risks involved. Many algorithms have been proposed to solve this problem. This paper proposes a new approach with a path-planning algorithm for automatic berthing tasks using deep reinforcement learning (RL) based on a maneuvering simulation. Unlike the conventional path-planning algorithm using the control theory or an advanced algorithm using deep learning, a state-of-the-art path-planning algorithm based on reinforcement learning automatically learns, explores, and optimizes the path for berthing performance through trial and error. The results of performing the twin delayed deep deterministic policy gradient (TD3) combined with the maneuvering simulation show that the approach can be used to propose a feasible and safe path for high-performing automatic berthing tasks.

Keywords:

path planning; deep reinforcement learning; TD3; maneuvering simulation; automatic berthing

1. Introduction

Since the early stages of modern shipbuilding, much attention has been paid to automated methods of ship navigation, particularly with the continuous advancements in artificial intelligence (AI). As a result, the number of autonomous ships has rapidly grown. Autonomous ship navigation offers substantial advantages in terms of safety, efficiency, reliability, and environmental sustainability. By harnessing advanced technologies such as sensor systems, data analysis, and artificial intelligence and by reducing the risk of human error, autonomous navigation systems ensure the safety of ship operations. These systems operate consistently and reliably, unhindered by human limitations, resulting in more predictable performance and fewer accidents. Autonomous ships can process vast amounts of data, enabling informed decision making, collision avoidance, and adaptation to changing conditions [1]. However, automatic ship berthing remains an extremely complex task, particularly under low-speed conditions where the hydrodynamic forces acting on the ship are highly nonlinear [2]. Controlling the ship becomes challenging and necessitates the expertise of an experienced commander. Numerous researchers have conducted extensive studies on the principles and algorithms for automatic ship berthing. Researchers have developed various ship control algorithms based on control theories and maneuverability assumptions [3,4,5,6,7,8,9]. These approaches proved effective under defined berthing conditions before the existence of AI. The development of AI algorithms has propelled the creation of algorithms and methods that enhance ship control performance, improve safety, and greatly reduce accidents in the marine industry. Many supervised learning algorithms based on neural networks have exhibited promising results with a high success rate, as in [10,11,12,13]. Applying AI algorithms eliminates the need to clearly understand the mathematical model of ships. However, acquiring a substantial number of labeled training data can be time-consuming and costly.

Unlike the aforementioned methods where the training dataset is not necessary, reinforcement learning techniques, which constitute an area of machine learning, allow the ship to learn and optimize its berthing maneuvers through interactions with a simulated environment. The application of RL in the automatic berthing task has shown good results when the ship can automatically learn the strategy and optimize the control policy to move to the berthing point [14,15].

In this paper, initial development of a novel path-planning algorithm for autonomous ship berthing that uses the latest technique in reinforcement learning, called twin delayed deep deterministic policy gradient (TD3), is proposed. The TD3 algorithm was introduced by Scott Fujimoto (2018) and is specifically designed for continuous action spaces. TD3 is an extension of deep deterministic policy gradients (DDPGs) and aims to address certain challenges and improve the stability of learning in complex environments. TD3 exploration algorithms allow the agent to explore the environment and gain new experiences that optimize rewards through trial and error. It employs two distinct value function estimators, which mitigate the overestimation bias and stabilize the learning process. Leveraging these two critics, TD3 provides more accurate value estimates and facilitates better policy updates. High performance and stability compared to other algorithms in the field of RL were shown in [16]. In combination with the MMG model, the solutions for ship-maneuvering motion simulation that were proposed by a research group maneuvering modeling group (MMG) in 1977 [17] suggest a feasible path, resulting in faster convergence and improved accuracy.

This article is organized into five parts. The first section introduces previous studies conducted in this field. Section 2 presents the equation of motion for the ship based on the MMG model along with hydrodynamic and interaction coefficients. Section 3 outlines the path-planning algorithm based on deep reinforcement learning TD3. Section 4 showcases and discusses simulation results for two berthing cases. Finally, Section 5 concludes this research.

2. Mathematical Model

2.1. Coordinated System

This paper focuses on the motions of USVs in horizontal planes only. Thus, two coordinate systems were defined for a maneuvering ship based on the right-hand rule, as shown in Figure 1. The earth-fixed coordinate system is

O x y

, where the origin is located on the water surface and the body-fixed coordinate system is

o x_{b} y_{b}

, where the origin is in the midship. The

x

and

y

axes point toward the ship’s bow and starboard, respectively. The heading angle

ψ

represents the angle between the

x

and

x_{b}

axes.

2.2. Mathematical Model of USV

The motion equation with three degrees of freedom (3-DOF) is established based on Newton’s second law. In this paper, the berthing task assumed performing in the calm water, the disturbances from environments such as waves and wind in the port area are ignored due to the simplicity of the equation of motion. Thus, the heave, roll, and pitch motion are relatively small and not significantly affected in the equation of motion; thus, it can be neglected from the equation of motion. Furthermore, the low-speed condition made the 3-DOF motions sufficient to simulate the motion of the vehicle. Additionally, the main purpose of this paper is to focus on the path planning algorithm to generate the feasible path for the berthing task based on the maneuvering simulation. The application of the 3-DOF motion equations causes simplicity but retains characteristics of the system. The MMG model for the 3-DOF equation of motion suggested in [17] that the total ship force and moment can be divided into sub-components: the hull, thruster, and steering system. Thus, the 3-DOF motion equation is expressed as

\begin{array}{l} m (\dot{u} - v r - x_{b_{G}} r^{2}) = X_{H} + X_{P} + X_{R} \\ m (\dot{v} + u r + x_{b_{G}} \dot{r}) = Y_{H} + Y_{P} + Y_{R} \\ I_{z z} \dot{r} + m x_{G} (\dot{v} + u r) = N_{H} + N_{P} + N_{R} \end{array}

(1)

where

m

is the mass of the ship;

x_{b_{G}}

is the longitudinal position of the center of gravity of the ship;

u, v, r

denote the surge, sway, and yaw velocities, respectively; the symbol “

˙

” in the head of the variable denotes the derivative of the variable with respect to time;

I_{z z}

is the moment representing the mass moment of inertia with respect to the z-axis;

X

,

Y

, and

N

represent the surge force, lateral force, and yaw moments, respectively, at the midship; and the subscripts

H

,

P

, and

R

denote the hull, rudder, and propeller, respectively.

Due to the operation conditions in the berthing phase, the hydrodynamic forces and moments around the midship acting on the ship hull were investigated at a low speed with a wide range of drift angles [2]. Thus, the equation of motion considers some of the high-order hydrodynamic coefficients and the hydrodynamic forces and moments caused by the hull, which are expressed as follows:

\begin{array}{l} X_{H} = X_{\dot{u}} \dot{u} + X_{u | u |} u | u | + X_{v v} v^{2} + X_{r r} r^{2} + X_{v r} v r \\ Y_{H} = Y_{\dot{v}} \dot{v} + Y_{\dot{r}} \dot{r} + Y_{v} v + Y_{v v v} v^{3} + Y_{v v v v v} v^{5} + Y_{r} r + Y_{r | r |} r | r | + Y_{v v r} v^{2} r + Y_{v r r} v r^{2} \\ N_{H} = N_{\dot{v}} \dot{v} + N_{\dot{r}} \dot{r} + N_{v} v + N_{u v} u v + N_{v v v} v^{3} + N_{u v v v} u v^{3} + N_{r} r + N_{r | r |} r | r | + N_{v v r} v^{2} r + N_{v r r} v r^{2} \end{array}

(2)

The hydrodynamic coefficients were described under the Taylor series expansion in terms of surge velocity, sway velocity, and yaw rate.

The ship model was equipped with twin-propeller and twin-rudder systems [18]. The thruster model is expressed as follows:

\begin{array}{l} X_{P} = (1 - t) ρ D^{4} [{(n^{P})}^{2} K_{T}^{P} J_{P}^{P} + {(n^{S})}^{2} K_{T}^{S} J_{P}^{S}] \\ Y_{P} = 0 \\ N_{P} = y_{P} (1 - t) ρ D_{P}^{4} [{(n^{P})}^{2} K_{T}^{P} J_{P}^{P} - {(n^{S})}^{2} K_{T}^{S} J_{P}^{S}] \end{array}

(3)

where

D

is the propeller diameter;

n

is the propeller revolutions per minute;

t

denotes the thrust deduction factor;

y_{P}

is the lateral position of the propeller from the centerline; and the superscripts

S

and

P

denote the side of the propeller (port and starboard).

The thrust coefficient

K_{T}

is described as the function of the advanced ratio coefficient

J_{P}

that was obtained through the propeller open water test

K_{T} = k_{0} + k_{1} J_{P} + k_{2} J_{P}^{2}

(4)

The parameter required for the estimation of thrust is given as

\begin{array}{l} J_{P}^{P, S} = \frac{u_{P}^{P, S}}{n^{P, S} D_{P}} \\ u_{P}^{P, S} = (1 - w_{P}^{P, S}) u \\ w_{P}^{P, S} = w_{P 0} \exp (- C_{P} {v^{'}}_{P}^{2}) \\ v_{P} = v + x_{P} r \\ C_{P}^{P} = C_{P}^{-} and C_{P}^{S} = C_{P}^{+} when β_{P} > 0 \\ C_{P}^{P} = C_{P}^{+} and C_{P}^{S} = C_{P}^{-} when β_{P} < 0 \end{array}

(5)

where the wake fraction at the propeller position

w_{P}

was estimated using the wake fraction at the propeller in the straight motion

w_{P 0}

; the geometrical inflow angle to the propeller position is denoted by

β_{P}

;

C_{P}^{+} and C_{P}^{-}

describe the wake-changing coefficients for plus and minus

β_{P}

due to lateral motion; and

x_{P}

denotes the longitudinal position from the midship.

Forces and moments due to the steering system for the twin rudder were calculated based on the normal force

F_{N}

and are expressed as

\begin{array}{l} X_{R} = - (1 - t_{R}) (F_{N}^{P} + F_{N}^{S}) \sin δ \\ Y_{R} = (1 + a_{H}) (F_{N}^{P} + F_{N}^{S}) \cos δ \\ N_{R} = (x_{R} + a_{H} x_{H}) (F_{N}^{P} + F_{N}^{S}) \cos δ - \\ y_{R} (1 - t_{R}) (F_{N}^{P} - F_{N}^{S}) \sin δ \end{array}

(6)

where the normal force acting on the rudder is described as follows (Equation (7)).

F_{N}^{P, S} = \frac{1}{2} ρ A_{R} {(U_{R}^{P, S})}^{2} f_{a} \sin α_{R}^{P, S}

(7)

The parameters required for estimating the rudder forces and moment during the maneuver are given as

\begin{array}{l} U_{R}^{P, S} = \sqrt{{(u_{R}^{P, S})}^{2} + {(v_{R}^{P, S})}^{2}} \\ f_{a} = \frac{6.13 Λ}{Λ + 2.25} \\ α_{R}^{P, S} = δ - \tan^{- 1} (\frac{v_{R}^{P, S}}{u_{R}^{P, S}}) \\ u_{R}^{P, S} = ε u_{R}^{P, S} \sqrt{η {[1 + κ (\sqrt{1 + \frac{8 K_{T}^{P, S}}{π {(J_{P}^{P, S})}^{2}}} - 1)]}^{2} + (1 - η)} \\ v_{R}^{P, S} = γ_{R}^{P, S} (v + l_{R} r) \\ γ_{R}^{P} = γ_{R}^{-} and γ_{R}^{S} = γ_{R}^{+} when β_{R} > 0 \\ γ_{R}^{P} = γ_{R}^{+} and γ_{R}^{S} = γ_{R}^{-} when β_{R} < 0 \end{array}

(8)

where

F_{N}

is the normal rudder force;

t_{R}

,

a_{H}

, and

x_{H}

are the steering resistance deduction factor, the rudder increase factor, and the position of an additional lateral force component, respectively;

U_{R}

is the resultant rudder inflow velocity;

f_{a}

is the rudder lift gradient coefficient;

Λ

is the rudder aspect ratio;

α_{R}

is the effective inflow angle to the rudder;

u_{R}

and

v_{R}

are longitudinal and lateral inflow velocity components to the rudder;

ε

is a ratio of a wake fraction at the propeller and rudder position;

γ

is the flow straightening coefficient; and

β_{R}

is the effective inflow angle to the rudder in the maneuvering motion.

2.3. Hydrodynamic and Interaction Coefficients

Previous studies were carried out by research groups at Changwon National University on hydrodynamic properties under operating conditions [19]. Experiments were conducted on the Korean autonomous surface ship (KASS) model, a ship model used in the project carried out by many universities and research institutes to develop autonomous ships. The main characteristics and the shape of the ship’s model are shown in Table 1 and Figure 2 The cross-comparison of the results between the previous studies and [20] shows similarities in the results. Figure 3 shows the comparison results of the turning maneuverability at a rudder angle of 35 degrees at three and six knots. Similarly, in order to obtain the feasible path for the automatic berthing task, the high-accuracy equation of motion and hydrodynamic coefficients of the USV should be investigated carefully using the Korean autonomous surface ship (KASS) model. In this paper, the hydrodynamic coefficients were estimated using the captive model test at Changwon National University and compared with the CFD method as presented in [21]. The coefficients relative to only the surge velocity were estimated through the resistance test. The hydrodynamic coefficients related to surge and sway velocity for forces and moments were estimated through a static drift test. The hydrodynamic coefficients related to the yaw rate were estimated using the circular motion test and the hydrodynamic coefficients related to the combined effect of sway velocity and yaw rate were estimated using the combined circular motion with drift test. The added mass and interaction coefficients were selected from [21]. The summary of hydrodynamics and interaction coefficients are shown in Table 2 and Table 3.

2.4. Maneuverability

To define the problem, it is necessary to assess the maneuverability of the ship. Understanding the maneuverability makes it reasonable to determine where to start the automatic berthing process. The maneuvering simulation under low speed (1 knot) was conducted to investigate the ship-maneuvering characteristics in the port environment. Figure 4 shows the trajectory of the turning circle test at 35 degrees of a rudder angle at 1 knot. The simulation results show that this can easily turn into a range of approximately 3

L_{P P}

. Thus, the berthing area should be greater than three times that of

L_{P P}

from the berthing point. The maneuvering characteristics are shown in Table 4.

3. Path-Planning Approach

In the last few decades, significant advancements have been made in the field of artificial intelligence, particularly reinforcement learning, a subfield of machine learning, which falls under the machine learning category. Reinforcement learning involves training data by assigning rewards and punishments based on behavior and state. Unlike supervised and semi-supervised learning, reinforcement learning does not rely on pairs of input data or true results and it does not explicitly evaluate near-optimal actions as true or false. As a result, reinforcement learning offers a solution to tackle complex problems, including the control of robots, self-driving cars, and even applications in the aerospace industry. A noteworthy advancement in reinforcement learning is the introduction of a twin delayed deep deterministic policy gradient in 2018. TD3 is an effective model-free policy reinforcement learning method. The TD3 agent is an actor–critic reinforcement learning agent that optimizes the expected long-term reward. Specifically, TD3 builds upon the success of the deep deterministic policy gradient algorithm developed in 2016. DDPG remains highly regarded and successful in the continuous action space, finding extensive applications in fields such as robotics and self-driving systems.

However, like many algorithms, DDPG has its limitations, including instability and the need for fine-tuning hyperparameters for each task. Estimation errors gradually accumulate during training, leading to suboptimal local states, overestimation, or severe forgetfulness on the part of the agent. To address these issues, TD3 was developed with a focus on reducing the overestimation bias prevalent in previous reinforcement learning algorithms. This is achieved through the incorporation of three key features:

The utilization of twin critic networks, which work in pairs.
Delayed updates of the actor.
Action noise regularization.

By implementing these features, TD3 aims to enhance the stability and performance of reinforcement learning algorithms, ultimately improving their applicability in various domains.

3.1. Conception

The path-planning algorithm in this paper was performed using the KASS model mentioned in Section 2.3. The port selected was the Busan port, whose geometry is shown in Figure 5. The objective of this result is to use the TD3 (the pseudocode as shown in Algorithm 1) for training the model that can generate the path for the berthing process. First, the TD3 algorithm trains the model with the combination of maneuvering simulations suggested by the MMG model. This approach allows for the integration of realistic ship motion dynamics into the training process. Then, this model is used to generate the desired path for the berthing task by inputting the state of the ship and predicting the control signal

n

(propeller speed) and

δ

(rudder angle), based on the input ship state

s (x, y, ψ, u, v, and r)

. The concept of path planning for automatic berthing tasks is shown in Figure 6.

Algorithm 1: Pseudocode TD3 Algorithm
1	Initialize the critic network $Q_{ϕ 1}, Q_{ϕ 2}$ and actor-network $μ_{θ}$ with random parameter $ϕ_{1}, ϕ_{2}, θ$
2	Initialize the target parameter to the main parameter $θ_{t \arg} \leftarrow θ, ϕ_{t \arg 1} \leftarrow ϕ_{1}, ϕ_{t \arg 2} \leftarrow ϕ_{2}$
3	For $t = 0$ to $T - 1$ do:
4	Observe the state $s$ of the environment and choose the action $a = c l i p (μ_{θ} (s) + ε, a_{l o w}, a_{h i g h}) where ε ~ N$
5	Execute action $a$ in the TD3 environment to observe the new state $s^{'}$ , reward $r$ , and done signal that gives the signal to stop training for this step.
6	Store training set $(s, a, r, s^{'}, d)$ in replay buffer D
7	If $s^{'}$ taking to the goal point, reset the environment state
8	If it is time for an update: For $j$ in range (custom decided) do:
9	Randomly sample a batch of transitions $B = \{(s, a, r, s^{'}, d)\} from D$
10	Compute Target actions: $\begin{array}{l} a^{'} (s^{'}) = c l i p (μ_{θ t \arg} (s^{'}) + c l i p (ε, - c, c), a_{l o w}, a_{h i g h}) \\ where ε ~ (N, σ) \end{array}$
11	Compute targets: $y (r, s^{'}, d) = r + γ (1 - d) \min_{i = 1, 2} Q_{ϕ t \arg i} (s^{'}, a^{'} (s^{'}))$
12	Update Q-functions using gradient descent: $\begin{array}{l} \nabla_{ϕ i} \frac{1}{\|B\|} \sum_{(s, a, r, s', d) \in B}^{} {(Q_{ϕ i} (s, a) - y (r, s', d))}^{2} \\ where i = 1, 2 \end{array}$
13	If $j$ mod policy delay == 0 then: Update policy by the one-step deterministic policy gradient ascent using $\nabla_{θ} \frac{1}{\|B\|} \sum_{s \in B} Q_{ϕ 1} (s, μ_{θ} (s))$
14	Update target networks: $\begin{array}{l} ϕ_{t \arg . i} \leftarrow ρ ϕ_{t \arg . i} + (1 - ρ) ϕ_{i} where i = 1, 2 \\ θ_{t \arg} \leftarrow ρ θ_{t \arg} + (1 - ρ) θ \end{array}$
15	End if End for End if End until convergence

3.2. Setting for Reinforcement Learning

In this section, the parameters and variables for the TD3 algorithm to deal with the automatic berthing task were set as follows:

Observation space and state: The observation space and state were defined as the set of physical velocity, position, and orientation. The state vector $s (x, y, ψ, u, v, and r)$ includes the position $x$ , $y$ as the element. The orientation $ψ$ is the heading angle. The linear velocity is $u$ , $v$ and the angular velocity is $r$ ;
Action: The control action includes the control input of the thruster (revolution of propeller) and steering (rudder angle) system. The action signal is continuous in the range [−1, 1], where [−1, 1] = [−300, 100] rpm represents the thrust system and [−1, 1] where [−1, 1] = [−35, 35] is the degrees for the steering system.
Reward function: This plays a crucial role in the design of a reinforcement learning application. It serves as a guide for the network training process and helps optimize the model’s performance throughout each episode. If the reward function does not accurately capture the objectives of the target task, the model may struggle to achieve desirable performance.

$\begin{array}{l} r_{S} = r_{i} - r_{i - 1} \\ r = r_{D i s t} + r_{L i n V e l} + r_{A n g V e l} + r_{H e a d i n g} \\ r_{D i s t} = - 100 \sqrt{x^{2} + y^{2}} \\ r_{H e a d i n g} = - 1000 |ψ_{T \arg e t} - ψ| \\ r_{L i n V e l} = - 2000 \sqrt{u^{2} + v^{2}} \\ r_{A n g V e l} = - 1000 |r| \end{array}$

In this paper, based on the boundary state, the weights 100, 1000, 2000, and 1000 were assigned to the weight of distance, heading, linear, and angular velocity, respectively. The reward value was described as the sum of the reward in each time step. It received a positive value if the state variable changed to the required value and vice versa. Furthermore, for the faster convergence of rewards in the first stage of the berthing task, the distance is more prior to the target state than the speed and heading. Thus, the reward function of distance is multiplied by the distance coefficients. It is (1.1-Distance coefficients) in the case of the reward function of the heading, resultant velocity, and yaw rate. The reward coefficients are described in Figure 7.

Environment: The environment receives the input as the control input and the state then returns the ship’s new state and the reward for this action. The environment function was built based on a maneuvering simulation that uses the MMG model as a mathematical simulation.
Agent: The hyperparameters for the TD3 model were selected as follows: the number of hidden layers was set as two layers with 512 units for each. The learning rate for the actor and critic networks α and β was set to 0.0001. The discount factor $γ$ was 0.99. The soft update coefficient $τ$ was 0.005. The batch size was 128. This training process was set to 20,000/50,000 steps for the warmup of the model with the exploration noise set in Table 5.

3.3. Boundary Conditions

The simulations were performed at Busan port, with the satellite capture as in Figure 7. The geometry of this port was simplified as Figure 8. Considering the geometry of this port, two cases of berthing were selected to investigate the path-plan-generating system.

Case 1: Parallel berthing task in a 30.5 × 202 m water area. In this case, the ship’s initial states were generated randomly, as described in Table 6. The berthing point was assumed, as shown in Table 7. The berthing task can be considered a success if the ship state is in the range of the values described in Table 7. The boundary area is described as a Figure 8a and Table 8.

Case 2: Perpendicular berthing task in a 42 × 170.5 m water area. In this case, the ship’s initial states were generated randomly, as described in Table 9. The berthing point was assumed, as shown in Table 10. The berthing task can be considered a success if the ship state is in the range of the values described in Table 10. The boundary area is described as a Figure 8b and Table 11.

4. Simulation Results and Discussion

The USV is considered to have successful berthing if it approaches the target berthing point with an error within the allowable range established in Section 3.3. The training process stops when the number of training episodes reaches 50,000.

Figure 9 and Figure 10 sequentially present a set of information that includes the trajectory, surge, sway, yaw rate, heading angle, and control input as the propeller revolution and rudder angle. The USV starts from the random state shown in Section 3.3. Under the automatic control of the TD3 model, the ship has successfully berthed at the target location. The ship states gradually change from the initial state to the required range. In particular, the combination with the adjusted reward function made the ship berthing process happen faster and more optimally. The time series of surge velocity was shown in the first phase of the berthing process. The ship’s speed was increased to reduce the distance to the berthing point. However, the sway velocity and yaw rate did not change much in this phase. In the last phase of the berthing process, the importance of surge velocity, yaw rate, and heading angle is much more important than the distance in the first phase. So, the change in this value seems more sudden to adapt to the required value of the berthing process.

Figure 11a,b shows the learning performance of TD3 case 1 with 50,000 episodes. Although the warmup episode number is 20,000, the average reward shows that the model seems to be successfully berthing and stable before the warmup phase is finished. This demonstrates that TD3 provides good reinforcement learning for the automatic berthing process. To account for the difference in penalty values affecting the training process, too high a penalty will cause the model to misjudge the states, for instance, in cases where the vessel has moved reasonably close to the berthing position. However, it then collides with the wall and receives a heavy point deduction, causing the model to judge that process as wrong and try actions other than that process. This makes the training process longer. These results show that the method proposed in this paper has a high performance and success rate with a low penalty value.

The simulation results show that the combination of TD3 and the maneuvering simulation proposes a powerful and accurate system for the automatic berthing process. Comparing the shape of the average reward to the results shown in [15] demonstrates that the stability of the TD3 algorithm is better than the older algorithm in the field of reinforcement learning. In particular, the method proposed in this paper is easier to apply because of the ability to learn, explore, and optimize the policy automatically. It can be used for another ship model if we know the ship’s hydrodynamic characteristics.

However, the limitations in this paper are evident. Firstly, the simplification of the port condition: due to the simplicity of the model, the effect of the disturbance was ignored in the simulation. This causes non-accuracy if there is wind or waves. The second limitation of this approach is the simplification of the obstacles and port geometry. It has a significant effect on the determination of the initial state and the berthing point. The presence of moving obstacles can make the training time increase significantly. So, the method proposed in this paper should only be used in the determined port and not for moving obstacles. Finally, this paper proposed the initial development of the path planning system. The results are performed based on maneuver simulation. The accuracy and performance of this approach in real-world operations need to be carefully considered and evaluated.

Although the approach in this paper uses the newest technique in reinforcement learning at present, the performance of this method should be investigated and compared carefully.

5. Conclusions and Remarks

In this study, the Korea autonomous surface ship (KASS) model was selected as the target ship to perform the training for path planning for the autonomous berthing task. A mathematical model and the hydrodynamics coefficients suggested in previous research conducted at Changwon National University provided an accurate model for solving the motion of a slow ship. By performing the path-planning algorithm based on the combination of TD3 and a maneuvering simulation, the automatic berthing task could be conducted with the automatic berthing problem and stable performance using reinforcement learning.

Even though the high performance of the path-planning system was shown, the complex environmental disturbance in the port area needs to be included in the model. It takes more time to train the model but is a necessary factor in the real situation. Additionally, several algorithms based on control theory must be considered for faster convergence.

Author Contributions

Conceptualization, H.K.Y. and A.K.V.; methodology, H.K.Y., A.K.V. and T.L.M.; software, A.K.V.; validation, A.K.V. and T.L.M.; formal analysis, A.K.V. and T.L.M.; investigation, A.K.V. and T.L.M.; resources, T.L.M.; data curation, A.K.V. and T.L.M.; writing—original draft preparation, A.K.V.; writing—review and editing, H.K.Y. and T.L.M.; visualization, A.K.V.; supervision, H.K.Y.; project administration, H.K.Y.; funding acquisition, H.K.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Development of Autonomous Ship Technology (PJT201313, Development of Autonomous Navigation System with Intelligent Route Planning Function), funded by the Ministry of Oceans and Fisheries (MOF, Korea).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available in this article (Tables and Figures).

Conflicts of Interest

The authors declare no conflict of interest.

Nomenclature

Item	Unit
$α_{R}$	-	Effective inflow angle to the rudder
$β$	$r a d$	Drift angle of the ship
$β_{P}$	$r a d$	Inflow angle to the propeller
$β_{R}$	$r a d$	Inflow angle to the rudder
$γ_{R}$	-	Flow straightening coefficient of the rudder
$δ$	$r a d$	Rudder angle
$η$	-	Propeller diameter to the rudder span ratio
$Λ$	-	Rudder aspect ratio
$κ$	-	The experimental coefficient for longitudinal inflow velocity to the rudder
$\nabla$	$m^{3}$	Displacement of the ship
$ψ$	$r a d$	Heading angle of the ship
$ρ$	$k g / m^{3}$	Water density
$ε$	-	Ratio of the wake fraction at the propeller to the rudder
$A_{R}$	$m^{2}$	Rudder profile area
$a_{H}$	-	Increase factor of the rudder force
$B$	$m$	Breadth of ship
$B_{R}$	$m$	Average rudder chord length
$C_{P}$	-	Experimental constant due to wake characteristic in maneuvering
$D_{P}$	$m$	Diameter of propeller
$d$	$m$	Draft of ship
$F_{N}$	$N$	Normal force of the rudder
$F_{X}, F_{Y}$	$N$	Surge and sway force acting on the ship
$f_{α}$	-	Lift gradient coefficients of the rudder
$H_{R}$	$m$	Span of rudder
$I_{Z}$	$k g m^{2}$	Moment of inertial of ship
$J_{P}$	-	Advanced ratio of propeller
$K_{T}$	-	Propeller thrust open water characteristic
$k_{0}, k_{1}, k_{2}$	-	Coefficients relative to $K_{T}$
$L_{P P}$	$m$	Ship length between two perpendicular
$l_{R}$	$m$	Effective longitudinal length of the rudder position
$M_{Z}$	$N m$	Yaw moment acting on the ship
$m$	$k g$	Ship mass
$n_{P}, n_{S}$	$r p m$	Propeller revolutions per minute (rpm)
$O - x y z$	-	Earth-fixed coordinate system
$o - x_{b} y_{b} z_{b}$	-	Body-fixed coordinate system
$R_{0}^{'}$	-	Resistance of the ship in straight motion (-)
$r$	$d e g / s$	Yaw rate
$r$		Reward
$T$	$N$	Longitudinal propeller force
$t$	$s$	Time
$t_{P}$	-	Thrust deduction factor
$t_{R}$	-	Steering deduction factor
$U$	$m / s$	Resultant velocity
$U_{0}$	$m / s$	Initial resultant velocity
$U_{R}$	$m / s$	Resultant inflow velocity to the rudder
$u, v$	$m / s$	Longitudinal and lateral velocity of the ship in the body-fixed coordinate system
$u_{R}, v_{R}$	$m / s$	Longitudinal and lateral inflow velocity to rudder position
$w_{P}$	-	Wake coefficient at the propeller in maneuvering motion
$w_{P 0}$	-	Wake coefficient at the propeller at straight motion
$w_{R}$	-	Wake coefficient at the rudder position
$X, Y, N$	$N, N, N m$	Surge, sway force, and yaw moment around the midship
$X_{H}, Y_{H}, N_{H}$	$N, N, N m$	Surge, sway force, and yaw moment acting on ship’s Hull
$X_{P}, Y_{P}, N_{P}$	$N, N, N m$	Surge, sway force, and yaw moment due to the propeller
$X_{R}, Y_{R}, N_{R}$	$N, N, N m$	Surge, sway force, and yaw moment due to the rudder
$x_{G}$	$m$	Longitudinal position of the center of gravity
$x_{H}$	$m$	Longitudinal position of the acting point of addition lateral force
$x_{P}$	$m$	Longitudinal position of the propeller
$x_{R}$	$m$	Longitudinal position of the rudder

Abbreviations

Item
AI	Artificial Intelligence
CFD	Computational Fluid Dynamics
DDPG	Deep Deterministic Policy Gradients
DRL	Deep Reinforcement Learning
MMG	Maneuvering Modeling Group
RL	Reinforcement Learning
TD3	Twin Delayed DDPG (a variant of the DDPG algorithm)
USV	Unmanned Surface Vehicle

References

Chaal, M.; Ren, X.; BahooToroody, A.; Basnet, S.; Bolbot, V.; Banda, O.A.V.; van Gelder, P. Research on risk, safety, and reliability of autonomous ships: A bibliometric review. In Safety Science (Vol. 167); Elsevier B.V.: Amsterdam, The Netherlands, 2023. [Google Scholar] [CrossRef]
Oh, K.G.; Hasegawa, K. Low speed ship manoeuvrability: Mathematical model and its simulation. In Proceedings of the International Conference on Offshore Mechanics and Arctic Engineering—OMAE, Nantes, France, 9–14 June 2013; p. 9. [Google Scholar] [CrossRef]
Shouji, K. An Automatic Berthing Study by Optimal Control Techniques. IFAC Proc. Vol. 1992, 25, 185–194. [Google Scholar] [CrossRef]
Skjåstad, K.G.; Barisic, M. Automated Berthing (Parking) of Autonomous Ships. Ph.D. Thesis, NTNU, Trondheim, Norway, 2018. [Google Scholar]
Mizuno, N.; Uchida, Y.; Okazaki, T. Quasi real-time optimal control scheme for automatic berthing. IFAC-Pap. 2015, 28, 305–312. [Google Scholar] [CrossRef]
Nguyen, V.S.; Im, N.K. Automatic ship berthing based on fuzzy logic. Int. J. Fuzzy Log. Intell. Syst. 2019, 19, 163–171. [Google Scholar] [CrossRef]
Zhang, Y.; Zhang, M.; Zhang, Q. Auto-berthing control of marine surface vehicle based on concise backstepping. IEEE Access 2020, 8, 197059–197067. [Google Scholar] [CrossRef]
Sawada, R.; Hirata, K.; Kitagawa, Y.; Saito, E.; Ueno, M.; Tanizawa, K.; Fukuto, J. Path following algorithm application to automatic berthing control. J. Mar. Sci. Technol. 2021, 26, 541–554. [Google Scholar] [CrossRef]
Wu, G.; Zhao, M.; Cong, Y.; Hu, Z.; Li, G. Algorithm of berthing and maneuvering for catamaran unmanned surface vehicle based on ship maneuverability. J. Mar. Sci. Eng. 2021, 9, 289. [Google Scholar] [CrossRef]
Im, N.; Seong Keon, L.; Hyung Do, B. An Application of ANN to Automatic Ship Berthing Using Selective Controller. Int. J. Mar. Navig. Saf. Sea Transp. 2007, 1, 101–105. [Google Scholar]
Ahmed, Y.A.; Hasegawa, K. Automatic ship berthing using artificial neural network trained by consistent teaching data using nonlinear programming method. Eng. Appl. Artif. Intell. 2013, 26, 2287–2304. [Google Scholar] [CrossRef]
Im, N.; Hasegawa, K. Automatic ship berthing using parallel neural controller. IFAC Proc. Vol. 2001, 34, 51–57. [Google Scholar] [CrossRef]
Im, N.K.; Nguyen, V.S. Artificial neural network controller for automatic ship berthing using head-up coordinate system. Int. J. Nav. Archit. Ocean. Eng. 2018, 10, 235–249. [Google Scholar] [CrossRef]
Marcelo, J.; Figureueiredo, P.; Pereira, R.; Rejaili, A. Deep Reinforcement Learning Algorithms for Ship Navigation in Restricted Waters. Mecatrone 2018, 3, 151953. [Google Scholar] [CrossRef]
Lee, D. Reinforcement Learning-Based Automatic Berthing System. arXiv 2021, arXiv:2112.01879. [Google Scholar]
Fujimoto, S.; van Hoof, H.; Meger, D. Addressing Function Approximation Error in Actor-Critic Methods. Int. Conf. Mach. Learn. 2018, 80, 1587–1596. [Google Scholar]
Yasukawa, H.; Yoshimura, Y. Introduction of MMG Standard Method for Ship Maneuvering Predictions. J. Mar. Sci. Technol. 2015, 20, 37–52. [Google Scholar] [CrossRef]
Khanfir, S.; Hasegawa, K.; Nagarajan, V.; Shouji, K.; Lee, S.K. Manoeuvring characteristics of twin-rudder systems: Rudder-hull interaction effect on the manoeuvrability of twin-rudder ships. J. Mar. Sci. Technol. 2011, 16, 472–490. [Google Scholar] [CrossRef]
Vo, A.K.; Mai, T.L.; Jeon, M.; Yoon, H.k. Experimental Investigation of the Hydrodynamic Characteristics of a Ship due to Bank Effect. Port. Res. 2022, 46, 294–301. [Google Scholar] [CrossRef]
Kim, D.J.; Choi, H.; Kim, Y.G.; Yeo, D.J. Mathematical Model for Harbour Manoeuvres of Korea Autonomous Surface Ship (KASS) Based on Captive Model Tests. In Proceedings of the Conference of Korean Association of Ocean Science and Technology Societies, Incheon, Republic of Korea, 13–14 May 2021. [Google Scholar]
Vo, A.K. Application of Deep Reinforcement Learning on Ship’s Autonomous Berthing Based on Maneuvering Simulation. Ph.D. Thesis, Changwon National University, Changwon, Republic of Korea, 2022. [Google Scholar]

Figure 1. Coordinate system of the twin-propeller and twin-rudder ship model.

Figure 2. Geometry of an autonomous surface ship (KASS).

Figure 3. Simulation results of turning trajectories at a rudder angle of 35 degrees at three and six knots [20].

Figure 4. Simulation results of turning trajectories at a rudder angle of 35 degrees.

Figure 5. Satellite image of the Busan port.

Figure 6. Concept of the path-planning algorithm.

Figure 7. Reward coefficients.

Figure 8. Geometry of the port and berthing situation: (a) parallel berthing task (case 1) and (b) perpendicular berthing task (case 2).

Figure 9. Simulation results in a parallel berthing task (case 1).

Figure 10. Simulation results in the perpendicular berthing task (case 2).

Figure 11. Learning performance of TD3: (a) parallel berthing (case 1) and (b) perpendicular berthing task (case 2).

Table 1. Principal dimensions.

Item (Unit)	Value
Length perpendicular, $L_{p p}$ ( $m$ )	22.000
Breadth, $B$ ( $m$ )	6.000
Draft, $T$ ( $m$ )	1.250
Displacement Volume, $V$ ( $m^{3}$ )	86.681
Rudder area, $A_{R}$ ( $m^{2}$ )	0.518
Rudder span, $H_{R}$ ( $m$ )	0.900
Propeller diameter, $D_{P}$ ( $m$ )	0.950

Table 2. Hydrodynamic force and moment coefficients.

$Hull (\times 1 0^{- 5}$ )
$X_{\dot{u}}$	−81	$Y_{\dot{v}}$	−1034	$Y_{\dot{v}}$	64
$X_{u u}$	−627	$Y_{\dot{r}}$	−126	$Y_{\dot{r}}$	−33
$X_{v v}$	−407	$Y_{v}$	−2610	$N_{v}$	−130
$X_{r r}$	675	$Y_{v v v}$	−3530	$N_{u v}$	−513
$X_{v r}$	226	$Y_{v v v v v}$	3080	$N_{v v v}$	−2
		$Y_{r}$	390	$N_{u v v v}$	−138
		$Y_{r \|r\|}$	−47	$N_{r}$	−178
		$Y_{v r r}$	−2170	$N_{r \|r\|}$	−253
		$Y_{v v r}$	−3590	$N_{v r r}$	−420
				$N_{v v r}$	−1830

Table 3. Interaction coefficients.

Propeller and Rudder
$1 - t_{R}$	0.934	$1 + a_{H}$	0.702	$γ_{R}^{+}$	0.342
$ε$	0.960	$C_{P}^{+}$	−2.713	$γ_{R}^{-}$	0.634
$κ$	0.695	$C_{P}^{-}$	11.211

Table 4. Turning maneuverability characteristics.

	Starboard Turning	Port Turning
Advance ( $L_{P P}$ )	2.681	2.680
Transfer ( $L_{P P}$ )	1.114	1.115
Turning Radius ( $L_{P P}$ )	2.625	0.623
Tactical diameter ( $L_{P P}$ )	2.935	2.932

Table 5. Exploration rates.

Item	Value (Exploration Rate ϵ)
Step [0–5000]	0.5
Step [5000–10,000]:	0.4
Step [10,000–15,000]	0.3
Step [15,000–20,000]	0.2
Step [20,000–50,000]	0.1

Table 6. Initial state values (case 1).

Item	Value
$x_{0} (m)$	[−20, 20]
$y_{0} (m)$	[−10, 10]
$ψ_{0} (°)$	[−5, 5]
$u_{0} (k n o t)$	1
$v_{0} (k n o t)$	0
$r_{0} (° / s)$	0

Table 7. Target point (case 1).

Item	Value
$x (m)$	180 $\pm$ 2
$y (m)$	20 $\pm$ 0.5
$ψ (°)$	0 $\pm$ 3
$u (m)$	0 $\pm$ 0.1
$v (m)$	0 $\pm$ 0.05
$r (° / s)$	0 $\pm$ 1

Table 8. Boundary values (case 1).

Item	Value
$x (m)$	[−20, 182]
$y (m)$	[−10, 20.5]

Table 9. Initial state values (case 2).

Item	Value
$x_{0} (m)$	[−20, 20]
$y_{0} (m)$	[−10, 10]
$ψ_{0} (°)$	[−5, 5]
$u_{0} (k n o t)$	1
$v_{0} (k n o t)$	0
$r_{0} (° / s)$	0

Table 10. Target point (case 2).

Item	Value
$x (m)$	150 $\pm$ 0.5
$y (m)$	30 $\pm$ 2
$ψ (°)$	90 $\pm$ 3
$u (m / s)$	0 $\pm$ 0.1
$v (m / s)$	0 $\pm$ 0.05
$r (° / s)$	0 $\pm$ 1

Table 11. Boundary values (case 2).

Item	Value
$x (m)$	[−20, 150.5]
$y (m)$	[−10, 32]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Vo, A.K.; Mai, T.L.; Yoon, H.K. Path Planning for Automatic Berthing Using Ship-Maneuvering Simulation-Based Deep Reinforcement Learning. Appl. Sci. 2023, 13, 12731. https://doi.org/10.3390/app132312731

AMA Style

Vo AK, Mai TL, Yoon HK. Path Planning for Automatic Berthing Using Ship-Maneuvering Simulation-Based Deep Reinforcement Learning. Applied Sciences. 2023; 13(23):12731. https://doi.org/10.3390/app132312731

Chicago/Turabian Style

Vo, Anh Khoa, Thi Loan Mai, and Hyeon Kyu Yoon. 2023. "Path Planning for Automatic Berthing Using Ship-Maneuvering Simulation-Based Deep Reinforcement Learning" Applied Sciences 13, no. 23: 12731. https://doi.org/10.3390/app132312731

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Path Planning for Automatic Berthing Using Ship-Maneuvering Simulation-Based Deep Reinforcement Learning

Abstract

1. Introduction

2. Mathematical Model

2.1. Coordinated System

2.2. Mathematical Model of USV

2.3. Hydrodynamic and Interaction Coefficients

2.4. Maneuverability

3. Path-Planning Approach

3.1. Conception

3.2. Setting for Reinforcement Learning

3.3. Boundary Conditions

4. Simulation Results and Discussion

5. Conclusions and Remarks

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Nomenclature

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI