# Impact-Angle Constraint Guidance and Control Strategies Based on Deep Reinforcement Learning

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

- (1)
- To consider the effects of comprehensive flying object dynamics and the control mechanism, this study develops a Markov decision process model to address the guidance and control problem. Two DRL guidance and control strategies with the impact-angle constraint are proposed. The design of these strategies is based on the principles of dual-loop and integrated guidance and control;
- (2)
- An improved reward mechanism that includes the state component is designed. This mechanism combines the sparse reward and dense reward to correct the impact-angle errors while reducing the flying object–target distance. The designed dense reward mechanism can ensure that when the state components of the state are near the target value, the small improvement in the strategy can still obtain a large reward difference so that the network can converge more easily to the optimal. Furthermore, the dense reward mechanism can normalize the reward value and reduce the influence of the scalar value of each different reward on the overall reward, which improves the sparse reward problem. In addition, to address the negative effects of an unbounded distribution on a bounded action space, this study modifies the PPO actor network’s probability distribution by replacing the Gaussian distribution with a Beta distribution. This is because Beta distribution sampling can constrain the sampling value in the interval of [0, 1] so that the sampling action can be mapped to any desired action interval;
- (3)
- In order to ensure the applicability of the model to different impact angle controls, we introduce the real-time impact-angle error into the state, which makes the agent pay more attention to the impact-angle error rather than a certain impact angle in the learning process. In addition, combined with an engineering background, we designed a reasonable initialization training scenario to ensure that the guidance control strategies can meet different scenarios and expected impact angles.

## 2. Environmental Description

_{d}represents the incoming flow pressure, and ${q}_{d}=\rho {V}^{2}/2$, where $\rho $ represents the air density at the flying altitude of the flying object; ${\delta}_{z}$ is the elevator deflection angle; ${c}_{x}={c}_{x0}+{c}_{x}^{{\alpha}^{2}}{\alpha}^{2}$ is the resistance coefficient; ${c}_{x0}$ is the zero-lift drag coefficient; ${c}_{x}^{{\alpha}^{2}}$ represents the derivative of the induced resistance coefficient to ${\alpha}^{2}$; ${c}_{y}^{\alpha}$ represents the derivative of the lift coefficient to the AOA; ${c}_{y}^{\delta}$ is the derivative of the lift coefficient to the elevator deflection angle; ${m}_{z}^{\alpha}$ represents the derivative of the pitching moment coefficient to the AOA; ${m}_{z}^{\overline{\omega}}$ denotes the derivative of the pitching moment coefficient to the dimensionless pitch angular rate; and ${m}_{z}^{\delta}$ is the derivative of the pitching moment coefficient to the elevator deflection angle.

## 3. DRL-Based Guidance and Control Design with Impact-Angle Constraint

#### 3.1. DRL Model

_{0}to the final state ${s}_{T}$ is expressed as follows: $({s}_{0},{a}_{0},{r}_{0})$, $({s}_{1},{a}_{1},{r}_{1})$, ⋯, $({s}_{T},{a}_{T},{r}_{T})$.

_{t}, action a

_{t}, and reward r

_{t}.

#### 3.2. PPO Algorithm and Enhancement

_{t}, and it is calculated by:

#### 3.3. Network Structure and Learning Process

_{t}based on the motion equations and obtains an action a

_{t}using the probability distribution of the policy sampling. Then, it calculates the real-time reward based on the state components. In a single learning process, first, the RL-based algorithm interacts ${N}_{s}$ times with the environment using the old policy to generate a trajectory sequence denoted by $\tau :({s}_{0},{a}_{0},{r}_{0}),\cdots ,({s}_{{N}_{s}},{a}_{{N}_{s}},{r}_{{N}_{s}})$, where ${N}_{s}$ is the length of the replay buffer used in the PPO algorithm. Next, it stores the generated trajectory $\tau $ into the replay buffer. To improve the training efficiency, this study performs batch training on the data acquired by sequentially processing a number of ${T}_{s}$-length trajectories in the replay buffer.

## 4. Simulation and Analysis

#### 4.1. Training Process

^{2}, the reference length was L = 0.55 m, the air density was $\rho $ = 1.225 kg/m

^{3}, and the moment of inertia was J

_{z}= 0.5. The axial thrust of flying object P was set to a constant value of 90 N. The relevant aerodynamic parameters are shown in Table 4. Further, to reduce the effect of the integration step size on the miss distance, the training environment was updated using a fourth-order Runge–Kutta integrator. Particularly, when the flying object altitude was above 0.6 m, the integration step size was set to 0.002 s; when the flying object altitude was less than or equal to 0.6 m but larger than 0.06 m, the integration step size was set to 0.0002 s; when the flying object altitude was less than or equal to 0.06 m, the integration step size was set to 0.00001 s.

- (1)
- The flying object landed;
- (2)
- The flying object attained an altitude of more than 6000 m;
- (3)
- The maximum number for the training epoch had been reached.

#### 4.2. Test Results

#### 4.2.1. Comparative Analysis of the DLGCIAC-DRL and IGCIAC-DRL Strategies

#### 4.2.2. Monte Carlo Simulations

#### 4.3. Computational Cost Analysis

## 5. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Lee, C.H.; Seo, M.G. New insights into guidance laws with terminal angle constraints. J. Guid. Control. Dyn.
**2018**, 41, 1832–1837. [Google Scholar] [CrossRef] - Tsalik, R.; Shima, T. Optimal guidance around circular trajectories for impact-angle interception. J. Guid. Control. Dyn.
**2016**, 39, 1278–1291. [Google Scholar] [CrossRef] - Park, B.G.; Kim, T.H.; Tahk, M.J. Range-to-go weighted optimal guidance with impact angle constraint and seeker’s look angle limits. IEEE Trans. Aerosp. Electron. Syst.
**2016**, 52, 1241–1256. [Google Scholar] [CrossRef] - Kim, M.; Grider, K.V. Terminal guidance for impact attitude angle constrained flight trajectories. IEEE Trans. Aerosp. Electron. Syst.
**1973**, 1, 852–859. [Google Scholar] [CrossRef] - Zhang, Y.; Ma, G.; Wu, L. A biased proportional navigation guidance law with large impact-angle constraint and the time-to-go estimation. Proc. Inst. Mech. Eng. Part G J. Aerosp. Eng.
**2014**, 228, 1725–1734. [Google Scholar] [CrossRef] - Erer, K.S.; Ozgoren, M.K. Control of impact-angle using biased proportional navigation. In Proceedings of the AIAA Guidance, Navigation, and Control (GNC) Conference, Boston, MA, USA, 19–22 August 2013. [Google Scholar]
- Biswas, B.; Maity, A.; Kumar, S.R. Finite-time convergent three-dimensional nonlinear intercept angle guidance. J. Guid. Control. Dyn.
**2020**, 43, 146–153. [Google Scholar] [CrossRef] - Majumder, K.; Kumar, S.R. Finite-time convergent impact angle constrained sliding mode guidance. IFAC-Pap.
**2020**, 53, 87–92. [Google Scholar] [CrossRef] - Cho, H.; Ryoo, C.K.; Tsourdos, A.; White, B. Optimal impact angle control guidance law based on linearization about collision triangle. J. Guid. Control. Dyn.
**2014**, 37, 958–964. [Google Scholar] [CrossRef] - Lee, C.H.; Tahk, M.J.; Lee, J.I. Generalized formulation of weighted optimal guidance laws with impact angle constraint. IEEE Trans. Aerosp. Electron. Syst.
**2013**, 49, 1317–1322. [Google Scholar] [CrossRef] - Ibarz, J.; Tan, J.; Finn, C.; Kalakrishnan, M.; Pastor, P.; Levine, S. How to train your robot with deep reinforcement learning: Lessons we have learned. Int. J. Robot. Res.
**2021**, 40, 698–721. [Google Scholar] [CrossRef] - Piccinin, M.; Lavagna, M.R. Deep reinforcement learning approach for small bodies shape reconstruction enhancement. In Proceedings of the AIAA Scitech 2020 Forum, Orlando, FL, USA, 6–10 January 2020. [Google Scholar]
- Graesser, L.; Keng, W.L. Foundations of Deep Reinforcement Learning: Theory and Practice in Python; Pearson Education: London, UK, 2019. [Google Scholar]
- Brandonisio, A.; Capra, L.; Lavagna, M. Spacecraft adaptive deep reinforcement learning guidance with input state uncertainties in relative motion scenario. In Proceedings of the AIAA SCITECH 2023 Forum, National Harbor, MD, USA, 23–27 January 2023. [Google Scholar]
- LaFarge, N.B.; Miller, D.; Howell, K.C.; Linares, R. Autonomous closed-loop guidance using reinforcement learning in a low-thrust, multi-body dynamical environment. Acta Astronaut.
**2021**, 186, 1–23. [Google Scholar] [CrossRef] - Hovell, K.; Ulrich, S.; Bronz, M. Acceleration-based quadrotor guidance under time delays using deep reinforcement learning. In Proceedings of the AIAA Scitech 2021 Forum, Reston, VA, USA, 11–15 & 19–21 January 2021. [Google Scholar]
- Hua, H.; Fang, Y. A novel reinforcement learning-based robust control strategy for a quadrotor. IEEE Trans. Ind. Electron.
**2022**, 70, 2812–2821. [Google Scholar] [CrossRef] - Zhou, W.; Li, J.; Liu, Z.; Shen, L. Improving multi-target cooperative tracking guidance for UAV swarms using multi-agent reinforcement learning. Chin. J. Aeronaut.
**2022**, 35, 100–112. [Google Scholar] [CrossRef] - Zhang, Q.; Ao, B.; Zhang, Q. Reinforcement learning guidance law of Q-learning. Syst. Eng. Electron.
**2020**, 42, 414–419. [Google Scholar] - Gaudet, B.; Furfaro, R.; Linares, R. Reinforcement learning for angle-only intercept guidance of maneuvering targets. Aerosp. Sci. Technol.
**2020**, 99, 105746. [Google Scholar] [CrossRef] - Wang, Z.; Li, H.; Wu, Z.; Wu, H. A pretrained proximal policy optimization algorithm with reward shaping for aircraft guidance to a moving destination in three-dimensional continuous space. Int. J. Adv. Robot. Syst.
**2021**, 18, 1729881421989546. [Google Scholar] [CrossRef] - Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv
**2017**, arXiv:1707.06347. [Google Scholar] - Qiu, X.; Gao, C.; Jing, W. Deep reinforcement learning guidance law for intercepting endo-atmospheric maneuvering targets. J. Astronaut.
**2022**, 43, 685–695. [Google Scholar] - Peng, C.; Zhang, H.; He, Y.; Ma, J. State-following-kernel-based online reinforcement learning guidance law against maneuvering target. IEEE Trans. Aerosp. Electron. Syst.
**2022**, 58, 5784–5797. [Google Scholar] [CrossRef] - Jiang, L.; Nan, Y.; Zhang, Y.; Li, Z. Anti-Interception Guidance for hypersonic glide vehicle: A deep reinforcement learning approach. Aerospace
**2022**, 9, 424. [Google Scholar] [CrossRef] - Hui, J.; Wang, R.; Guo, J. Research of intelligent guidance for no-fly zone avoidance based on reinforcement learning. Acta Aeronaut. Astronaut. Sin.
**2023**, 44, 240–252. [Google Scholar] - Luo, W.; Chen, L.; Liu, K.; Gu, H.; Lü, J. Optimizing constrained guidance policy with minimum overload regularization. IEEE Trans. Circuits Syst. I Regul. Pap.
**2022**, 69, 2994–3005. [Google Scholar] [CrossRef] - Wang, W.; Wu, M.; Chen, Z.; Liu, X. Integrated Guidance-and-Control Design for Three-Dimensional Interception Based on Deep-Reinforcement Learning. Aerospace
**2023**, 10, 167. [Google Scholar] [CrossRef] - Liang, C.; Wang, W.; Liu, Z.; Lai, C.; Zhou, B. Learning to guide: Guidance law based on deep meta-learning and model predictive path integral control. IEEE Access
**2019**, 7, 47353–47365. [Google Scholar] [CrossRef] - Liang, C.; Wang, W.; Liu, Z.; Lai, C.; Wang, S. Range-aware impact-angle guidance law with deep reinforcement meta-learning. IEEE Access
**2020**, 8, 152093–152104. [Google Scholar] [CrossRef] - Li, B.; An, X.; Yang, X.; Wu, Y.; Li, G. A distributed reinforcement learning guidance method under impact-angle constraints. J. Astronaut.
**2022**, 43, 1061–1069. [Google Scholar] - Liu, Z.; Wang, J.; He, S.; Li, Y. A computational guidance algorithm for impact-angle control based on predictor-corrector concept. Acta Aeronaut. Astronaut. Sin.
**2022**, 43, 521–536. [Google Scholar] - Lee, S.; Lee, Y.; Kim, Y.; Han, Y.; Kwon, H.; Hong, D. Impact Angle Control Guidance Considering Seeker’s Field-of-View Limit Based on Reinforcement Learning. J. Guid. Control. Dyn.
**2023**, 46, 2168–2182. [Google Scholar] [CrossRef] - Ji, Y.; Lin, D.; Pei, P.; Shi, X.; Wang, W. Robust partial integrated guidance and control approaches for maneuvering targets. Int. J. Robust Nonlinear Control.
**2019**, 29, 6522–6541. [Google Scholar] [CrossRef] - Wang, X.; Wang, J. Partial integrated guidance and control with impact angle constraints. J. Guid. Control. Dyn.
**2015**, 38, 925–936. [Google Scholar] [CrossRef] - Fang, J.; Lin, D.; Qi, Z.; Zhang, H. Design and analysis of a two-loop autopilot. Syst. Eng. Electron.
**2008**, 30, 2447–2450. [Google Scholar]

**Figure 8.**The variation law of the objective function J with the change in the rt value: (

**a**) ${A}_{\mathrm{GAE}}^{\pi}>0$ > 0; (

**b**) ${A}_{\mathrm{GAE}}^{\pi}>0$ < 0.

**Figure 9.**The structures of the actor and critic networks: (

**a**) actor network and (

**b**) critic network.

**Figure 12.**The training performances of the DLGCIAC-DRL strategy: (

**a**) Cumulative reward; (

**b**) Miss distance and terminal impact-angle error.

**Figure 13.**The training performances of the IGCIAC-DRL strategy: (

**a**) cumulative reward and (

**b**) miss distance and terminal impact-angle error.

**Figure 14.**Comparison of simulation results under different expected impact angles using the DLGCIAC-DRL strategy: (

**a**) ballistic trajectory; (

**b**) speed variation; (

**c**) trajectory inclination; (

**d**) acceleration; (

**e**) AOA and elevator deflection angle; and (

**f**) pitch angle and pitch angular rate.

**Figure 15.**The statistics of the miss distance and terminal impact-angle error using the DLGCIAC-DRL strategy.

**Figure 16.**The simulation results of the IGCIAC-DRL strategy: (

**a**) ballistic trajectory; (

**b**) speed variation; (

**c**) trajectory inclination; (

**d**) AOA and elevator deflection angle; and (

**e**) pitch angle and pitch angular rate.

**Figure 17.**The statistics of the miss distance and terminal impact-angle error using the IGCIAC-DRL strategy.

**Figure 18.**The Monte Carlo simulation results: (

**a**) ballistic trajectory and (

**b**) the miss distance and terminal impact-angle error.

**Figure 19.**Statistical histogram of the terminal impact-angle error and ballistic impact point distribution: (

**a**) terminal impact-angle error and (

**b**) ballistic impact point distribution.

${\mathit{\theta}}_{\mathit{e}\mathit{r}\mathit{r}\mathit{o}\mathit{r}}{}^{\mathit{t}\mathit{a}\mathit{r}}$ | ${\mathit{R}}^{\mathit{t}\mathit{a}\mathit{r}}$ | ${\mathit{w}}_{{\mathit{\theta}}_{\mathit{e}\mathit{r}\mathit{r}\mathit{o}\mathit{r}}}$ | ${\mathit{w}}_{\mathit{R}}$ | $\mathbf{\Theta}$ |
---|---|---|---|---|

0 | 0 | 0.8 | 0.2 | 0.2 |

Layer | Actor Network | Critic Network |
---|---|---|

Input layer | 8 (state dimension) | 8 (state dimension) |

Hidden layer 1 | 64 | 64 |

Hidden layer 2 | 64 | 64 |

Hidden layer 3 | 64 | 64 |

Output layer | 2 (parameters of the Beta distribution) | 1 (action dimension) |

Parameter | Value |
---|---|

Initial horizontal coordinate of flying object position | −500–500 m |

Initial vertical coordinate of flying object position | 2250–2750 m |

Initial velocity V | 180–250 m/s |

Initial ballistic inclination $\theta $ | −10–10° |

Expected impact angle ${\theta}_{d}$ | −15–−80° |

${\mathit{c}}_{\mathit{x}0}$ | ${\mathit{c}}_{\mathit{x}}^{{\mathit{\alpha}}^{2}}$ (rad^{−2}) | ${\mathit{c}}_{\mathit{y}}^{\mathit{\alpha}}$ (rad^{−1}) | ${\mathit{c}}_{\mathit{y}}^{\mathit{\delta}}$ (rad^{−1}) | ${\mathit{m}}_{\mathit{z}}^{\mathit{\alpha}}$ (rad^{−1}) | ${\mathit{m}}_{\mathit{z}}^{\mathit{\delta}}$ (rad^{−1}) | ${\mathit{m}}_{\mathit{z}}^{\overline{\mathit{\omega}}}$ (s·rad^{−1}) |
---|---|---|---|---|---|---|

0.3092 | 16.4163. | 15.2131 | 2.9395 | −1.2864 | −1.1976 | −0.8194 |

Parameter | Value | Parameter | Value |
---|---|---|---|

The maximum number of training epochs for the DLGCIAC-DRL strategy | 10,000 | $\varsigma $ | 0.01 |

The maximum number of training epochs for the IGCIAC-DRL strategy | 15,000 | Hidden layer size | 64 |

Algorithm decision interval | 0.02 s | Optimizer eps parameter | 1 × 10^{−5} |

${N}_{s}$ | 1536 | ${\alpha}_{\chi}$ | 3 × 10^{−4} |

${T}_{s}$ | 512 | ${\alpha}_{\varphi}$ | 3 × 10^{−4} |

$\lambda $ | 0.95 | $\kappa $ | 0.9 |

The number of times the experience replay buffer data are used for each update | 5 | $\epsilon $ | 0.2 |

$\gamma $ | 0.99 | Random seed | 10 |

**Table 6.**The statistical data on the impact-angle error and miss distance of the DLGCIAC-DRL and IGCIAC-DRL strategies.

${\mathit{\theta}}_{\mathit{e}\mathit{r}\mathit{r}\mathit{o}\mathit{r}}$ (°) | Miss Distance (m) | |||
---|---|---|---|---|

DLGCIAC-DRL | IGCIAC-DRL | DLGCIAC-DRL | IGCIAC-DRL | |

Maximum value | 0.8292 | 0.5085 | 0.4626 | 0.4932 |

Average value | −0.029 | 0.1476 | 0.1989 | 0.2199 |

Standard deviation | 0.1762 | 0.2314 | 0.1299 | 0.1359 |

Mean squared error | 0.0656 | 0.0751 | 0.0592 | 0.0668 |

Variables | Type of Error Distribution | Error Magnitude (2σ) |
---|---|---|

Resistance coefficient (${c}_{x}$) | Gaussian | 10% |

Lift coefficient (${c}_{y}$) | Gaussian | 10% |

Pitching moment coefficient (${m}_{z}$) | Gaussian | 10% |

Pitching rotational inertia (${J}_{z}$) | Gaussian | 10% |

Pitch angle ($\vartheta $)/deg | Uniform | (−1, 1) |

Elevator deflection error | Gaussian | 10% |

Thrust error | Gaussian | 10% |

${\mathit{\theta}}_{\mathit{e}\mathit{r}\mathit{r}\mathit{o}\mathit{r}}$ | Miss Distance | |||||||
---|---|---|---|---|---|---|---|---|

DLGCIAC-DRL | IGCIAC-DRL | TSG | BPN-ACG | DLGCIAC-DRL | IGCIAC-DRL | TSG | BPN-ACG | |

Maximum value | 1.7159 | 8.0935 | 1.9243 | 2.0290 | 0.5274 | 13.0343 | 0.5714 | 0.9058 |

Average value | −0.017 | 0.9456 | 0.0216 | −0.103 | 0.1057 | 2.4751 | 0.1145 | 0.1281 |

Standard deviation | 0.7149 | 3.3135 | 0.7418 | 0.7666 | 0.0775 | 2.1387 | 0.0808 | 0.103 |

Mean squared error | 0.5384 | 11.8627 | 0.5500 | 0.5978 | 0.0161 | 10.6958 | 0.0174 | 0.027 |

Calculating Operation | Frequency |
---|---|

Multiplication operation | 12,929 |

Addition operation | 13,187 |

Activation function solving operation | 258 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Fan, J.; Dou, D.; Ji, Y.
Impact-Angle Constraint Guidance and Control Strategies Based on Deep Reinforcement Learning. *Aerospace* **2023**, *10*, 954.
https://doi.org/10.3390/aerospace10110954

**AMA Style**

Fan J, Dou D, Ji Y.
Impact-Angle Constraint Guidance and Control Strategies Based on Deep Reinforcement Learning. *Aerospace*. 2023; 10(11):954.
https://doi.org/10.3390/aerospace10110954

**Chicago/Turabian Style**

Fan, Junfang, Denghui Dou, and Yi Ji.
2023. "Impact-Angle Constraint Guidance and Control Strategies Based on Deep Reinforcement Learning" *Aerospace* 10, no. 11: 954.
https://doi.org/10.3390/aerospace10110954