An Intellectual Aerodynamic Design Method for Compressors Based on Deep Reinforcement Learning

Xu, Xiaohan; Huang, Xudong; Bi, Dianfang; Zhou, Ming

doi:10.3390/aerospace10020171

Open AccessArticle

An Intellectual Aerodynamic Design Method for Compressors Based on Deep Reinforcement Learning

by

Xiaohan Xu

,

Xudong Huang

^*,

Dianfang Bi

and

Ming Zhou

School of Aerospace Engineering, Tsinghua University, Beijing 100084, China

^*

Author to whom correspondence should be addressed.

Aerospace 2023, 10(2), 171; https://doi.org/10.3390/aerospace10020171

Submission received: 17 December 2022 / Revised: 8 February 2023 / Accepted: 9 February 2023 / Published: 13 February 2023

Download

Browse Figures

Versions Notes

Abstract

:

Aerodynamic compressor designs require considerable prior knowledge and a deep understanding of complex flow fields. With the development of computer science, artificial intelligence (AI) has been widely applied to compressors design. Among the various AI models, deep reinforcement learning (RL) methods have successfully addressed complex problems in different domains. This paper proposes a modified deep deterministic policy gradient algorithm for compressor design and trains several agents, improving the performance of a 3D transonic rotor for the first time. An error reduction process was applied to improve the capability of the surrogate models, and then RL environments were established based on the surrogate models. The rotors generated by the agent were evaluated by computational fluid dynamic methods, and the flow field analysis indicated that the combination of the sweep, lean, and segment angle modifications reduced the loss near the tip, while improving the pressure ratio in the middle section. Different policy combinations were explored, confirming that the combined policy improved the rotor performance more than single policies. The results demonstrate that the proposed RL method can guide future compressor designs.

Keywords:

artificial intelligence; reinforcement learning; transonic rotor; compressor design; sweep and lean

1. Introduction

The compressor is one of the most critical components in an aero engine [1,2]. Modern compressors are characterized by higher stage pressure ratios, increased efficiency, and wider stability margins [3]. Initial axial compressor designs mainly relied on empirical data correlations and through-flow methods [4,5]. To date, although empirical input and experience are still needed [5] in turbomachinery design, various computational fluid dynamics (CFD) combined with different optimization approaches have been widely used.

The flow in a compressor is generally more complex than conventional scenarios due to the features of the air, such as viscosity, compressibility, and strong turbulence. To this end, CFD approaches are efficient and convenient for analyzing complex three-dimensional flows in compressors [6]. Among the multitude of turbulence models, Reynolds-averaged Navier-Stokes (RANS) equations are generally applied in CFD approaches [6,7] because of their higher efficiency and satisfactory accuracy for the internal high Reynolds flow. In combination with the development of modern turbomachinery design methods, CFD has been applied to analyze transonic compressors [8], with an upward trend in the multistage configuration and unsteady prediction [6].

Optimization, which follows the direct design procedure [9], is an integral part of the compressor design process [10]. Two main cluster optimization methods are utilized in compressor design [11]: stochastic algorithms and gradient-based methods. A considerable amount of the literature reports on the stochastic algorithms in rotor design, for example, Bonaiuti and Zangeneh [12] optimized a single-stage compressor using NSGA–II [13] and improved its efficiency and operating range. Ma et al. [14] compared the particle swarm optimization (PSO) algorithm [15], the genetic algorithm (GA), and a hybrid PSO–GA approach in a centrifugal compressor, noting that the hybrid PSO–GA method showed the best performance among the three approaches. Some novel swarm intelligence algorithms have also been applied in turbomachinery. For example, a whale optimization algorithm [16] was used to decrease the loss of a controlled diffusion airfoil by Huang et al. [17]. In terms of gradient-based algorithms, Tang et al. [18] improved the efficiency and pressure ratio of NASA rotor 67 [19] with an adjoint-response surface method, and Du et al. [20] optimized a stator with the gradient descent method.

The above optimization methods determine the best solution under certain conditions, rather than offering a design policy. As a result, the optimized result is difficult to use if the designs need to be modified further, which requires prior design knowledge. Conversely, the development of reinforcement learning (RL) methods allows machines to devise designs similarly to humans [21]. Recently, RL methods have successfully addressed many complex design problems, including designing personalized therapies [22], designing proteins [23], and devising matrix multiplication algorithms [24].

However, RL applications in aerodynamic design are still relatively rare. The design variables are generally continuous, and several deep RL algorithms [25] are developed. In this case, existing approaches mainly focus on 2D airfoils because the 2D cases possess fewer design variables and lead to relatively simple flow fields. Viquerat et al. [26] maximized the lift-to-drag ratio by exploring the design space using the proximal policy optimization (PPO) algorithm [27]. Li et al. [28] also trained agents to learn the design policy of a supercritical airfoil using PPO and minimize the drag. In the turbomachinery field, Qin et al. [29] applied the deep deterministic policy gradient (DDPG) [30] to modify a blade profile and reduce the total pressure loss. Previous researchers have confirmed that RL methods can successfully learn the design policy of 2D airfoils and improve aerodynamic performance, while the 3D rotor case has not yet been explored.

Machine learning methods have also been applied in other turbomachinery research. Artificial neural networks and their variations work as surrogate models in several studies [14,20]. Li et al. [31] established deep convolutional generative adversarial networks that learn from existing airfoils and generate new airfoils. In terms of CFD solvers, Hammond et al. [32,33] found that machine learning methods can improve the accuracy of RANS models by modeling the turbulence of data generated by large eddy simulation (LES), direct numerical simulation (DNS) and other methods. For example, the mean square error was improved by 16% over the k−ε model in [33]. In general, machine learning applications significantly improve turbomachinery analysis and optimization, and show considerable progress; however, they cannot yet fully replace conventional methods.

This paper extended the usage of RL algorithms in compressor design. A modified DDPG algorithm was proposed for the aerodynamic design of a transonic axial compressor, which comprises, to the best of our knowledge, the first attempt at addressing the 3D case. The policies for improving the pressure ratio and efficiency were learned and used to synergistically improve rotor 67. An error reduction process was proposed to enhance the kriging model based on a limited number of samples. Some training rules were summarized to guide further training. The remainder of this paper is organized as follows: Section 2 illustrates methods for parameterizing the rotor, establishes surrogate models, creates an RL environment, and evaluates the learned policy. In Section 3, the agents are trained in the RL environment to learn the design policies, and the results are analyzed. Section 4 discusses the details of the training agents and combinations of different policies.

2. Materials and Methods

The overall structure of the model is shown in Figure 1, where compressor aerodynamic design was approached as a Markov decision process (MDP) [21]. As the learner and decision-maker [21], the agent included the RL algorithms and interacted with the environment by performing action

a_{t}

according to state

s_{t}

at time step t. Then, the environment generated reward

r_{t}

and state

s_{t + 1}

according to

s_{t}

and

a_{t}

. The design variables were the states

s_{t}

, which were modified at each time step. A surrogate model was applied to reduce the computational cost of the CFD method. Finally, the result generated by the trained agents was validated by the CFD model.

2.1. Rotor Parameterization

A parameterization generator was built to design the rotor geometry. The design variables generated the 3D features and the distributions of the 2D parameters in different spanwise directions. Then, airfoils with different spans were generated, and three-dimensional rotor blades were staked by the 2D airfoils.

The 3D features sweep and lean have had different definitions in previous studies, and the definitions selected in this work are shown in Figure 2. In the coordinate system

m - θ r_{a}

of one stream surface,

d m (r_{a})

indicates the sweep feature, and

d θ (r_{a})

indicates the lean feature at the

r_{a}

span. The reference design variables can be extracted from existing rotor geometries using the rotor generation method. The NASA rotor 67 was parameterized, reconstructed, and plotted, as also shown in Figure 2, where the parameterization results indicate that the original and reconstructed geometry fit well.

Eighteen geometric parameters were selected as design variables, as shown in Table 1, where

\hat{r}

is the unidimensional radius, and

c_{0}

and

θ_{0}

are constants to normalize

d m

and

d θ

, with

c_{0}

= 9.561 cm and

θ_{0}

= 0.286 rad. The design variables were used as control points in the 3-order Bezier curve to generate the span distributions of

d m (r_{a})

and

d θ (r_{a})

, the inlet camber angle

χ_{i n} (r_{a})

, the outlet camber angle

χ_{o u t} (r_{a})

, and incidence

β_{y} (r_{a})

. Figure 3 shows the definitions of

χ_{i n} (r_{a})

,

χ_{o u t} (r_{a})

and

β_{y} (r_{a})

, where the

χ_{o u t} (r_{a}) < 0

because the counterclockwise direction was considered positive. The 2D airfoils changed as the

χ_{i n} (r_{a})

and

χ_{o u t} (r_{a})

were selected as the design variable, while other features including chord length, leading edge and trailing edge thickness remained unchanged. The reference value was extracted from the original NASA rotor 67 geometry to set the lower and upper bounds.

2.2. CFD Method

2.2.1. CFD Tools

The commercial software package NUMECA was selected for CFD analysis. The NUMECA AutoGrid generated an O–H-structured grid according to the rotor geometry. Then, the NUMECA Fine Turbo and CFView were applied to solve the 3D RANS equation and post-process. All tools were automatically driven by the predefined scripts.

The viscous and inviscid fluxes were determined using second-order Jameson-type dissipation, and an explicit Runge–Kutta scheme was applied for time discretization. The RANS equation was closed by the Spalart–Allmara (S–A) model, which has been effectively validated and applied in transonic compressor rotors [34,35,36] together with NUMECA Fine Turbo.

2.2.2. Rotor 67 Simulation

NASA rotor 67 [19] is a low-aspect ratio, transonic, axial-flow fan rotor with abundant experimental data, and was selected as the reference design, with its primary features shown in Table 2. It should be noted that the observed tip clearance was 0.061 cm rather than the designed value of 0.101 cm [19].

Different grids were calculated with 0.8 M, 1.2 M, and 1.6 M nodes, and

y^{+} = 1

was specified for all grids. The O–H topology worked well in the single-stage rotor and generated good grids, as shown in Figure 4. As for boundary conditions, total pressure and temperature were specified at the inlet, and static pressure was specified at the outlet.

The calculated operating characteristics and the experimental results are plotted in Figure 5. It can be seen that the pressure ratio and efficiency fit the experimental data well for the medium (1.2 million) and fine (1.6 million) meshes, while the coarse mesh results deviated from the experimental values. The calculated chock mass flow rate was 34.3 kg/s, which is slightly less than the experimental value but was considered acceptable.

Since the rotor was simulated in the steady state but the rotating stall was an unsteady process, the time-averaged features and convergence criteria can be used to approximate the numerical stall point [18,37,38]. The calculation converged if the adiabatic efficiency variation was less than 0.04% per 100 iterations, and the calculation was regarded as stalled if the CFD case did not converge. The calculated stall mass flow was 93% in both the 1.2 million and 1.6 million grids. Besides this, the calculated shock wave showed similar features to the experiment shock wave.

The CFD result of the pressure and temperature ratio distributions fits the experimental result well with a deviation of less than 3.27%, as plotted along the spanwise direction at near-peak efficiency in Figure 6. Thus, the CFD tools and methods provide reliable results, and these approaches were then applied to generate and evaluate the agent-generated rotor geometries. The design space was considered based on rotor 67, so the obtained results may share similar characteristics, and the CFD methods above can give reliable results.

2.2.3. Rotor Performance Specification

The most important performance variables for a compressor were the mass flow

\dot{m}

, pressure ratio

π

, efficiency

η

, and stability margin

S M

. A typical operating characteristic curve generated by the CFD method is shown in Figure 7, in which the back pressure

p_{o u t}

was changed according to a geometric distribution. The peak efficiency point was selected as the working point, and the mass flow

{\dot{m}}_{W}

, pressure ratio

π_{W}

, and efficiency

η_{W}

can also be determined simultaneously.

Furthermore, several variables were defined to evaluate the operating characteristic curve. An efficiency range

[{\dot{m}}_{l}, {\dot{m}}_{h}]

was defined after setting an efficiency threshold

η_{t}

, as shown in Figure 7. The variables included the numerical stall feature of the rotors, so these features were used instead of the exact stall point during agent training, which reduced the required computational resources.

The integral efficiency

\hat{η}

and pressure ratio

\hat{π}

are defined as Equations (1) and (2) in order to take the shape of the operating characteristic curve into account, which evaluates the performances in the range

[{\dot{m}}_{l}, {\dot{m}}_{h}]

. The rotor performance included seven variables, denoted by

p_{i} \in ℙ = {{\dot{m}}_{l}, {\dot{m}}_{h}, \hat{η}, \hat{π}, {\dot{m}}_{W}, π_{W}, η_{W}}

, i = 1, 2, …, and 7, respectively.

\hat{η} = \frac{\int_{{\dot{m}}_{l}}^{{\dot{m}}_{h}} η (\dot{m}) d \dot{m}}{{\dot{m}}_{h} - {\dot{m}}_{l}}

(1)

\hat{π} = \frac{\int_{{\dot{m}}_{l}}^{{\dot{m}}_{h}} π (\dot{m}) d \dot{m}}{{\dot{m}}_{h} - {\dot{m}}_{l}}

(2)

2.3. Modified DDPG Algorithm

The DDPG algorithm [30] is a model-free and off-policy approach that can robustly address challenging problems in continuous state spaces, including high-dimensional problems. Therefore, the DDPG algorithm is appropriate for application in compressor design optimization. The Adam optimizer [39] and L2 normalization were applied when implementing the DDPG algorithm. The environmental reward was defined by the rotor performance variables

p_{i} \in ℙ

.

2.3.1. Modifications to Improve the DDPG Algorithm

It was observed that the end states of MDPs dispersed to different degrees, and this dispersion was considered as distortion. Two agents with similar accumulated rewards R were compared, as shown in Figure 8, in which the agents acted in the real design space

ℝ_{r}^{n}

and generated the state sequences. The sequences started with different initial states and ended at the states in dashed circles, with the directions also marked out. A server distortion was found in Figure 8b, while the distortion in Figure 8a was relatively mild. Since the design space and performance are highly dimensional, this distortion cannot be eliminated easily. This feature has also been observed in other studies [40] using the DDPG algorithm. Agents with severe distortion cannot guide the design process; thus, three modifications were applied to reduce the distortion and improve the training speed.

High-order feedback was applied to guide the exploration, as shown in Figure 9. The agent determined an action according to the state, and then interacted with the environment. The state was the design variable—for example, one of the state dimensions was the incidence—and the action was the corresponding modification, such as an increase in the incidence. Unlike the other RL question, the initial state in the rotor design was configurable, so the agent was expected to explore more rigorously near the detected optimized state, and the best state

s^{*}

in the environment was updated in each test process.

Then, as a part of high-order feedback, Randomness was added to enhance the exploration. The initial states were selected near

s^{*}

with random directions

{\vec{n}}_{r a n d}

and deterministic radii

R_{c o n s t}

, as shown in Equation (3). This method ensured that the initial states were sufficiently similar to the recorded optimized state.

s_{o} = s^{*} + {\vec{n}}_{r a n d} • R_{c o n s t}

(3)

A virtual area was added to extend the design space so the agents could obtain more transitions

(s_{t}, a_{t}, r_{t}, s_{t + 1})

near the boundary. The agent stopped exploring when it exited the design space

ℝ_{r}^{n}

, hence the transitions reduced near the boundary. This reduction was considered one of the main reasons for the distortion observed in Figure 8, and could be resolved by adding the virtual area.

The real design space

ℝ_{r}^{n}

and the virtual area

ℝ_{v}^{n}

are shown in Figure 10. The correct reward term

δ r_{v i r}

was nonzero if the state was in

ℝ_{v}^{n}

, and zero if the state was in

ℝ_{r}^{n}

. d refers to the Manhattan distance between

s_{v} \in ℝ_{v}^{n}

and

ℝ_{r}^{n}

. The constant

c_{V} = - 0.3 / d x

ensured that

δ r_{v i r} \leq 0

, which guaranteed that the maximum reward state was in

ℝ_{r}^{n}

.

An artificial tip was utilized to improve the gradient near the

s^{*}

and accelerate the training by defining an additional correct term

δ r_{a r t}

in Equation (4), where

δ_{s}

and

d r_{0}

were constants and

d r_{0} = 0.08

in this study. The

δ r_{a r t}

did not change the monotonicity of the reward

r (s)

or the location of

s^{*}

, and increased the gradient.

δ r_{a r t} = \{\begin{array}{l} 0 & ‖s - s^{*}‖ \geq δ_{s} \\ d r_{0} & ‖s - s^{*}‖ \leq δ_{s} \end{array}

(4)

2.3.2. Environment Definition

The RL environment was coded following the template of OpenAI Gym. The state of the environment was the design variable

s \in S

, as discussed in Section 2.1. The aerodynamic performance variables

p_{i} \in ℙ

, i = 1, 2, …, and 7 were generated by the surrogate models. The reward in the environment was defined by

p_{i}

, as shown in Equation (5).

r = a_{1} r_{r a w} + a_{2} + \sum_{p_{i} \in ℙ} δ r_{p_{i}} + δ r_{a r t} + δ r_{v i r}

(5)

where

r_{r a w}

was a function of the different

p_{i}

variables that determines what the agents are expected to learn.

a_{1}

and

a_{2}

were constants used to scale

r_{r a w}

.

δ r_{a r t}

and

δ r_{v i r}

were the correct reward terms for the artificial tip and virtual area. This general reward form guaranteed that different

r_{r a w}

definitions could be solved universally, which benefited the agent training and improved the general applicability of the methodology.

The punishment term

δ r_{p_{i}}

was defined in Equation (6) as a constraint on each performance variable

p_{i}

.

δ r_{0, i}

and

w_{i}

determined the strength of the constraint, and

δ p_{i} = \min (|p_{i} - p_{c l, i}|, |p_{i} - p_{c u, i}|)

indicates how much

p_{i}

exceeded its reasonable range

[p_{c l, i}, p_{c u, i}]

, which was defined according to the performance of the reference rotor. The weightings

w_{i}

can be determined to balance the

δ p_{i}

of different dimensions using the Monte Carlo method.

δ r_{p_{i}} = \{\begin{matrix} 0 & p_{i} \in [p_{c l, i}, p_{c u, i}] \\ δ r_{0, i} + δ p_{i} \times w_{i} & p_{i} \notin [p_{c l, i}, p_{c u, i}] \end{matrix}

(6)

2.3.3. Policy Evaluation

The trained agents generated different design processes, and the states

s_{i}

formed curves in the design space

ℝ_{r}^{n}

. The maximum number of steps has been denoted as

N_{p}

, so i = 1, 2, …,

N_{p}

. In addition to the accumulated reward R, two additional criteria were defined to evaluate the policy.

First, the consistency criterion was defined to evaluate the distortion at the end of the process. The consistency criterion

ε_{d}

is defined in Equation (7), where

s_{e n d, j}

is the end state of the jth process among the

N_{p}

test processes with random initial states. Lower

ε_{d}

values indicate more successful training.

ε_{d} = \frac{1}{N_{p}} \sum_{j = 1}^{N_{p}} ‖s_{e n d, j} - \frac{1}{N_{p}} \sum_{k = 1}^{N_{p}} s_{e n d, j}‖

(7)

Then, the smoothness criterion was defined to filter incorrect policies. The agents may obtain the wrong policy and generate zigzag curves, as shown in Figure 11. This phenomenon might occur because the actor networks are overfitted on the recorded transactions, and this can be prevented by adjusting the hyperparameters. Agents with incorrect policies cannot guide the design and should be removed.

The smoothness criterion

ε_{c}

was defined according to the angles of

a_{i}

and

a v e_{i}

, as shown in Equation (8), where

s_{i}

and

a_{i}

are the transaction state and action,

n_{w}

indicates the number of states used to calculate

ε_{c}

, and

a v e_{i}

is the sum of the

2 n_{w} + 1

steps near

s_{i}

.

ε_{c}

was closer to 1 if the agent learned a good policy.

ε_{c} = \frac{1}{N - 2 n_{w} - 1} \sum_{i = 1 + n_{w}}^{N - n_{w}} \frac{|a_{i} • a v e_{i}|}{‖a_{i}‖ • ‖a v e_{i}‖}

(8)

The three modifications were tested in a demo case to verify the effectiveness of reducing

ε_{d}

. As shown in Equation (9), the reward in the demo case was defined by the distance from a selected extreme point

s_{0}

,

s \in ℝ^{2}

,

d (s, s_{0}) = ‖s - s_{0}‖

,

d_{m a x} (s_{0}) = \max (d (s, s_{0}))

, and the constant

p_{c}

defined the shape of the tip.

r (s, s_{0}) = \frac{p_{c}}{\frac{d (s, s_{0})}{d_{m a x} (s_{0})} + p_{c}}

(9)

The agents were trained with

s_{0} = (0.3, 1)

, which made the maximum point of the demo function near the boundary. Good

ε_{c}

values were obtained for all agents, and the consistency criterion

ε_{d}

was monitored. The training speed was evaluated by

η_{s} = R / N_{e p}

, where R was the accumulated reward and

N_{e p}

was the number of training episodes.

The agents in four different configurations were trained 10 times and stopped when

R > 85

or

N_{e p} > 500

, as shown in Figure 12. Version 1 used the original DDPG algorithm, and version 2 included the virtual grid. Version 3 included both high-order feedback and the virtual grid, and version 4 included all three modifications. When the modifications were applied, the average and minimum

ε_{d}

values decreased, and the

η_{s}

value increased, showing that the performance of the modified algorithms was better than that of the original DDPG algorithm.

2.4. Error-Reduced Kriging Model

Surrogate models provide accurate approximations in CFD analyses with considerably reduced computational costs, which accelerates the design process. The 3D rotors had more design variables than 2D airfoils, and their CFD analyses require more computational resources, increasing the difficulty of establishing surrogate models.

2.4.1. Kriging Model

The kriging model [41] needs few sample points, even in high-dimensional design spaces; thus, this model has been widely applied in compressor design [9]. This work used kriging models to approximate rotor performance and the maximin Latin hypercube design [42] to find the sample points.

ψ^{(i)} = \exp (- \sum_{j = 1}^{k} θ_{j} {|s_{j} - {s_{j}}^{(i)}|}^{P_{j}})

(10)

As a radial basis function (RBF) model, the number i RBF

ψ^{(i)}

is expressed by Equation (10), where

s_{j}

and

{s_{j}}^{(i)}

are the components of variable

s

and sample point

s^{(i)}

, and

k

is the dimension number. The kriging model included parameters

θ_{k} = {\{θ_{1}, θ_{2}, \dots, θ_{k}\}}^{T}

and

P_{k} = {\{P_{1}, P_{2}, \dots, P_{k}\}}^{T}

, and the number of parameters depended on the dimension of the design variable and the number of sample points. A PSO algorithm was applied to determine the kriging parameters by maximizing the likelihood function of the sample point’s value. Larger

P_{k}

and smaller

θ_{k}

values increased the smoothness of RBF, so fewer sample points were needed when decreasing

θ_{k}

and increasing

P_{k}

.

Seven kriging models were trained to approximate the performance of the rotor. It was assumed that the basis functions were sufficiently smooth, so the ranges of

θ_{k}

and

P_{k}

were limited. Combined with the error reduction process, the accuracy of the surrogate models improved.

2.4.2. Error Reduction

The error reduction process was implemented to enhance the performance of the kriging model and reduce the number of required samples. The basic idea of the error reduction process was to reduce the influence of extreme sample points.

The sample points with poor aerodynamic performance did not substantially influence the design, but they deteriorated the performance of surrogate models, because these points may be outliers with a lower reward. Therefore, sample points with poor aerodynamic performance were restricted by a preset reasonable range to improve model accuracy, as schematized in Figure 13. If a sample point obtained a performance outside this range, the sample point was forced to move toward the reasonable range. This process reduced the error between the sample point and the range, thereby improving model performance.

If the performance

p_{i} \in ℙ

exceeded the reasonable range

[p_{i, m i n}, p_{i, m a x}]

, the excessive part was reduced. For example, the unidimensional exceeding term

δ \hat{p_{i}}

can be defined as in Equation (11) if

p_{i} > p_{i, m a x}

and

δ \hat{p_{i}}

became

δ \hat{{p_{i}}^{'}}

after applying a logarithmic restriction in Equation (12). The parameter

a_{p}

indicates the strength of the restriction, where

a_{p} > 0

. In view of the computational cost and necessity, the candidate values were

a_{p} \in \{0, 3, 5\}

, and they are tested in Section 2.4.3.

δ \hat{p_{i}} = \frac{p_{i, m a x} - p_{i}}{p_{i, m a x} - p_{i, m i n}} > 0

(11)

δ \hat{{p_{i}}^{'}} = \frac{1}{a_{p}} \ln (a_{p} δ \hat{p_{i}} + 1)

(12)

Then, the restricted performance

{p_{i}}^{'}

was reconstructed using the restricted exceeding term

δ \hat{{p_{i}}^{'}}

, as shown in Equation (13).

δ \hat{{p_{i}}^{'}}

maintained the monotonicity of

δ \hat{p_{i}}

and the first–order smoothness. The restriction did not work if the

p_{i}

was less than the threshold. Then, the more

p_{i}

exceeded the threshold, the stronger the restriction was. The error reduction process was similar if

p_{i} < p_{i, m i n}

, and this significantly improved the performance of the surrogate model, as discussed in Section 2.4.3.

{p_{i}}^{'} = p_{i, m a x} + δ \hat{p_{i}} \times (p_{i, m a x} - p_{i, m i n})

(13)

2.4.3. Surrogate Models for Rotors

NASA rotor 67 was selected as the reference design, and the design space is shown in Table 1. Sample points

s \in S

were selected to generate the geometry, calculate the operating characteristics and determine the performance variables

p_{i} \in ℙ

, i = 1, 2, …7. Then, the surrogate models were trained using the sample points generated by CFD tools.

The coefficient of determination

R^{2}

was calculated to evaluate the trained surrogate models and has been defined in Equation (14), where

p_{i} \in ℙ

, i = 1, 2, …7 were the performance variables,

p_{i, j}

and

{\hat{p}}_{i, j}

were the actual and predicted values of the jth test point, respectively, and

{\bar{p}}_{i}

was the average value of all tested points.

{R^{2}}_{p_{i}} = \frac{\sum_{j = 1}^{n_{t}} {(p_{i, j} - {\bar{p}}_{i})}^{2}}{\sum_{j = 1}^{n_{t}} {({\hat{p}}_{i, j} - {\bar{p}}_{i})}^{2}}

(14)

The

R^{2}

values of

π_{w}

and

\hat{η}

with different numbers of sample points are shown in Figure 14. The surrogate models with and without the error reduction process were compared. The accuracy of the kriging models increased when applying the error reduction process. The improvement was obvious if the original surrogate models were less accurate, and decreased if the original models were sufficiently accurate.

Finally, surrogate models with different performance variables

p_{i}

were trained with various numbers of samples N, as shown in Figure 15. The combinations of different

a_{p}

,

p_{m i n}

and

θ_{m a x}

are listed in the legend, with

a_{p} = 0, 3, 5

and

θ_{m a x} = 10, 30

. In general, the models performed better when more sample points were included; the models for

{\dot{m}}_{l}

(i = 1),

\hat{η}

(i = 3) and

π_{W}

(i = 7) showed poor performance when N was less than 360 if

a_{p}

=0, and improved after N increased. Each

p_{i}

had its own distribution, and if the distribution was not smooth and N was large, larger

a_{p}

values reduced the model performance. The models for

{\dot{m}}_{W}

(i = 5) obtained

R^{2}

=0.987 when

a_{p}

=0, which was 0.1% higher than the

R^{2}

value obtained when

a_{p}

=3.

The best surrogate models for each

p_{i}

were selected and are shown in Table 3, where

R^{2}

and mean absolute error (MAE) are also listed. The surrogate models accurately predicted the performance of the rotor; thus, a reinforcement learning environment was established based on these surrogate models.

3. Results

3.1. Policy for Improving the Pressure Ratio

The pressure ratio is an important compressor performance indicator. Improving the stage pressure ratio can reduce the compressor’s size and weight. Thus, in this section, the agents have been trained to improve the pressure ratio of the rotors, and they are denoted as “series 1” agents.

The integrated pressure ratio

\hat{π}

was selected to establish the reward function, with

r_{r a w} = \hat{π} - {\hat{π}}_{0}

, where

{\hat{π}}_{0}

was the reference value, and then the reward function was determined according to Equation (5). The scaling and constraint parameters in Equation (5) were determined using the Monte Carlo method of three hundred random points to ensure that the reward r was approximately in the range

[0, 1]

.

The trained agents successfully learned the design policy. After approximately 4000 training episodes, the accumulated total reward R improved significantly. The smoothness criterion

ε_{c}

and consistency criterion

ε_{d}

were also considered during agent training. The details of the training rules are discussed in Section 4.1.

The agents started the design process in the reference state and modified all the design variables simultaneously to improve the pressure ratio. The nondimensionalized design variables are plotted in Figure 16, showing that the modification was noticeable in the first few steps, and then the change slowed. The policy learned by the agents was relatively complex, because the design variables at different spanwise distances showed distinct modifications. For example,

χ_{o u t, 1}

increased, while

χ_{o u t, 2}

and

χ_{o u t, 3}

decreased, and some design variables, such as

d m_{r, 1}

and

χ_{i n, 2}

, showed little change.

Figure 17 showed the operating characteristic curves of the rotors after the modification was applied. The pressure ratio improved with the number of steps and the efficiency improved in the first ten steps, showing that the RL environment constraints worked well. Then, as the agent attempted to further improve the pressure ratio, the efficiency deteriorated. The chock and near-peak efficiency mass flow also increased in the first ten steps, and then decreased; thus, the flow-efficiency curves nearly coincide at steps 0 and 15. In addition, the near-stall mass flow of the modified rotors was not worse than that of the original rotor.

The peak performance, chock mass flow rate and pressure ratio improved, as shown in Table 4. For the modified rotor, after 15 steps,

{\dot{m}}_{W}

and

η_{W}

were approximately equal to the reference values, and

π_{W}

increased by 1.01%. The flow details are analyzed in Section 3.2. The agent could further improve the pressure ratio if a slight decrease in

η_{W}

was accepted.

The variations in the geometric parameters

m

,

θ

,

χ_{i n}

,

χ_{o u t}

and

β_{y}

in the spanwise direction are plotted in Figure 18. The agents introduced a combined forward–back sweep feature, whereby the forward sweep feature was introduced in the middle section in the first five steps, and extended to approximately the whole span. The back sweep feature was introduced in the tip region. The blade lean feature was added with positive

d θ

, and also diminished near the tip. The airfoil segment angle

Δ β

increased in the middle span and decreased in the tip and hub regions as a result of the changes in

χ_{i n}

and

χ_{o u t}

. In addition,

β_{y}

decreased near the tip and increased near the hub.

The trained agents could start the design in different initial states and modified the design very quickly, requiring only a few seconds to determine the action for a given state. As shown in Figure 19, the excessive forward sweep in the second initial geometry and the excessive back sweep in the third initial geometry were removed in the first few steps. Then, after a sufficient number of steps, the geometries approached the same final states.

3.2. Flow Field Analysis

After the series 1 agents modified the design variables, the flow field and model performance changed. An analysis of the flow mechanisms shows that the policy learned by the agents was effective and interpretable.

The deviations of the absolute flow angle

Δ α

, local pressure ratio

π

, and local efficiency

η

are plotted in Figure 20.

Δ α

increased in the middle span after

χ_{i n}

and

χ_{o u t}

were modified, and the local pressure ratio

π

also increased. Extra diffusion increased the

π

, and the boundary layer loss increased, so the

η

decreased at approximately 80% span. Correspondingly, the local efficiency increased near the tip and hub, which compensated for the loss in the middle span; thus, the overall efficiency can be maintained at a relatively high level.

Zheng and Li [43] summarized that the forward sweep of rotor blades could improve the efficiency and the stable operating range. The sweep reduced the shock loss and the pressure increase in the rotors by changing the shape of the passage shock. As a result, extra pressure increases could be gained following diffusion, generating additional losses. Denton and Xu [44] affirmed that the shock waves in the tip region of the transonic rotor have a considerable impact on the pressure ratio. However, a shock wave that is too strong can cause excessive losses.

Figure 21 shows the isentropic Mach number along the chord in the 70% and 90% spanwise directions. The peak isentropic Mach number of the suction surface decreased after agent modification. After 15 steps, the isentropic Mach number increased a little, but was still no worse than the reference. Therefore, the angle change and the sweep weakened the shock wave, even with an increased pressure ratio. The isentropic Mach number of the pressure ratio near the leading edge decreased because

χ_{i n}

changed more than

β_{y}

. The trained agent learned an excellent balance between the sweep and diffusion, and thus improved the

π

and

η

of the rotors.

Denton and Xu [45] noted that the pressure gradient perpendicular to the end wall must be zero. For this reason, the back sweep at the blade’s tip reduces the load at the trailing edge near the shroud. As shown in Figure 22, the load reduction and the change in the airfoil segment angle reduced the separation near the tip. After 15 steps, the back sweep and the change in airfoil segment angle decreased slightly. As a result, the separation was higher, but less than that at 0 steps, guaranteeing the efficiency of the rotors at the tip.

According to Sasaki and Breugelmans [46], because the lean angle between the end wall and the suction surface was obtuse, unloading occurred near the end wall, while overloading occurred near the mid-span. Shang et al. [47] explain the mechanism using the pressure gradient generated by the lean feature. The positive lean increased the static pressure of the suction surface, then the pressure gradient, especially after the throat location, drove the low-energy flow away from the corner. The trained agent learned to increase dθ, introducing a similar lean feature into the rotors.

The static pressure distributions in the near–tip region at 80% axial length are compared in Figure 23. The static pressure increased in the tip region after agent modification. The pressure gradient drove the low-energy flow away from the corner, preventing the accumulation of the low-energy flow and reducing the separation. A change in the static pressure distribution occurred due to both the lean feature and the change in ∆β at the rotor tip.

The pressure difference between the pressure and suction surfaces drove the tip leakage flow and formed a tip leakage vortex (TLV). The geometry modifications influenced the TLV by changing the pressure difference between the pressure and suction surfaces.

The static pressure of the near-tip section (90% span) and near the tip clearance region (99.8% span) are plotted in Figure 24. The load at the leading edge near the tip increased because of the sweep and angle design, increasing the pressure difference at the leading edge. Then, the pressure difference of the modified rotors decreased after the 0.6 chord length. The minimum static pressure of the suction surface increased, and its location moved toward the leading edge, decreasing the inverse pressure gradient, which reduced the separation.

Chima [48] noted that the strength of the TLV depends on the chordwise integration pressure difference. Consequently, the TLV of modified rotors developed slowly and was weakened, even with a stronger initial strength at the leading edge. The axial velocity of the rotors at the tip clearance (99.8% span) is shown in Figure 25. The reference lines indicate that the TLVs in the modified geometries were stronger near the leading edge, but developed slower, and therefore the influence of the TLV was reduced.

Suder and Celestina [49] illustrated that the interaction between the TLV and the shock wave caused considerable losses. If the TLV crosses the passage shock, severe diffusion occurs, generating low-energy fluid because of the pressure rise caused by the shock wave. Then, the low-energy fluid mixes with the main flow, increasing the loss. The low-energy fluid also blocks the flow and influences the stall margin.

The static pressure at the 99.8% span is plotted in Figure 26. The pressure rise decreased from the front to behind the shock wave, as shown in the dashed circles in Figure 26, in which the pressure distributions are also plotted in Figure 24b. Therefore, the interaction between the TLV and the shock wave was reduced. Additionally, the initiation and development of the TLVs can be seen in the dashed ellipses. After 15 steps, the overall performance was not worse than the reference, even though the pressure rise caused by the shock wave increased slightly because the rotor generated a weaker TLV near the leading edge than after 5 steps.

The meridional velocity Vm values at 50% axial length and 80% axial length are shown in Figure 27, illustrating the flow details near the tip region. At 80% axial length, the low-energy fluid was significantly reduced after five steps. Then, after 15 steps, although the low-energy fluid partly regressed, the fluid was restricted to the tip region and mixed less with the main flow. The maximum Vm near the shroud was marked at 50% axial length surface, and the reduction in maximum Vm indicated that the TLV was reduced.

In summary, the agents improved the pressure ratio of the rotor by a combined policy that involved modifying all design variables. The additional increase in pressure was mainly produced by the change in

Δ β

. After the sweep and lean features were introduced and

Δ β

was changed, the shock loss and separation were controlled and the loss near the tip decreased, so the efficiency of the rotor was maintained.

3.3. Off–Design Conditions

3.3.1. Near Stall

The numerical stall point was determined as the last convergence point while increasing the back pressure, as mentioned in Section 2.2. The near-stall features [50] could be identified, which verified that the near stall points were correctly found.

The axial velocity and the static pressure near the tip clearance region (99.8% span) at the last convergence point are shown in Figure 28. Negative axial velocity was observed in the dashed ellipse area, which indicates that flow spillage existed at the leading edge near tip clearance. The stagnation point moved to the pressure surface in the dashed circles. Besides this, a low-momentum area was found to be the low axial velocity area in the passage. The results confirm that the correct near-stall point was correctly found, since near-stall features appeared.

S M = \frac{{\dot{m}}_{W}}{{\dot{m}}_{s t a l l}} \times \frac{π_{s t a l l}}{π_{W}} - 1

(15)

The stall margin was calculated with Equation (15), where

{\dot{m}}_{s t a l l}

and

π_{s t a l l}

were the mass flow rate and the pressure ratio at the last convergence point, respectively, which was considered the numerical stall point. The calculated SM of the reference rotor was

S M_{0} = 6.52 %

. After 5, 10, and 15 modification steps, this value changed to

S M_{5} = 7.54 %

,

S M_{10} = 8.15 %

and

S M_{15} = 6.53 %

, respectively, which results are all better than the reference rotor.

3.3.2. Off-Design Speed

The operating characteristic curves were calculated at 70% of the design speed in order to consider the off-design speed performance. Figure 29 shows that the modified rotors achieved better performance than the reference rotor over the whole operating characteristic curve.

One of the mechanisms for improving the pressure ratio

π

was modifying

Δ β

and

β_{y}

, which was done by the agent and discussed in Section 3.1. This mechanism remained effective at 70% speed, so the 0.59% improvement in

π

remained at the peak efficiency flow of the reference rotor.

The flow rate and the rotation speed were reduced at 70% speed, so the shock structure was weaker than that at 100% speed. After 15 modification steps, the rotor partially increased the pressure ratio using the shock wave. As a result, the efficiency was slightly reduced at 100% speed, but remained higher than that of the rotor after 10 modification steps at 70% speed.

4. Discussion

4.1. Parameter Influence on the Training Process

The hyperparameters and convergence strategy significantly influenced whether the training process was successful. Hence, the rules and experiences have been summarized for successful agent training, and different agents have been trained in the meantime.

Agents with different

r_{r a w}

values were trained with the same artificial neural network hyperparameters and different RL environment parameters, as shown in Table 5.

N_{e p}

is the episode when the accumulated reward was increased.

a_{1}

and

a_{2}

are the coefficients in the RL environment that determined

r_{D}

.

r_{m i n}

and

r_{m a x}

are the maximum and minimum rewards of 300 random sample points, respectively, and the reward range is

r_{D} = r_{m a x} - r_{m i n}

.

δ r_{0, a v e}

is the average of the

δ r_{0, i}

values for each dimension of

p_{i} \in ℙ

. The ratio of

r_{D}

to

δ r_{0, a v e}

indicates how much

r_{r a w}

influenced the reward r. Two training concerns emerged from the analysis of the above training results.

First, reasonable RL environmental hyperparameters were needed. The R of the series 3 agents improved after approximately 3000 episodes, while the series 4 agents did not learn anything meaningful, even after 14,000 episodes. The difference between the series 3 and series 4 agents was

r_{D}

. In addition,

N_{e p}

remained similar for different

r_{r a w}

values if

r_{D}

was similar. Therefore,

r_{D}

was set to be slightly larger than 1 and

δ r_{0, a v e}

was set to −0.05, according to the training results.

Second, the appropriate convergence strategy must be determined. Checking

ε_{d}

required relatively large amounts of calculation, so

ε_{d}

was evaluated only if the R value was higher than the threshold

R_{t}

. The results show that the changes in the R and

ε_{d}

were not synchronized. This phenomenon sometimes resulted in deteriorated agent performance, and was observed in different series of agents.

Figure 30 displays the R and

ε_{d}

values of 10 random testing points of the series 2 agents. The R value increased after approximately 3500 episodes, as shown in Figure 30a, then

ε_{d}

decreased firstly and then increased, as shown in Figure 30b. Therefore, the threshold for R should not be too high, or agents with acceptable R and

ε_{d}

values will be difficult to obtain. The recommended R threshold was set to 70% of the maximum.

The recommended values of parameters are summarized in Table 6 according to the analysis above. The

r_{m a x}

and

r_{m i n}

can be configured by calculating the

a_{1}

and

a_{2}

after the

r_{r a w}

is constructed. Correspondingly, the

δ r_{0, a v e}

,

R_{t}

, and

ε_{d, t}

can be adjusted directly.

The modifications to the DDPG algorithm show the advantages of reducing

ε_{d}

in rotor design environment. As shown in Figure 31, agents were trained on the original and modified DDPG algorithms, and their performances were compared. Ten agents with and without the modifications were trained using the same convergence strategy and threshold, and the best agent was selected as the final result. After the modification was applied, the

ε_{d}

value decreased from 0.136 to 0.086.

4.2. Policy for Improving the Peak Efficiency

Efficiency is another critical objective in compressor design. The series 2 agents successfully learned to improve the efficiency by setting

r_{r a w} = η_{W} - η_{W, 0}

and adjusting other parameters, where

η_{W, 0}

was the

η_{W}

value of the reference rotor.

The learned policy is illustrated in Figure 32. The tip of the rotor was swept forward at the first few steps and then swept back. The negative lean feature changed to positive from the tip region to the lower span. As for 2D airfoils, the segment angle decreased mainly because of the variation in

χ_{i n}

.

It was mentioned in [51] that the forward and back sweep features both increased the efficiency; however, the stall margin was diminished when using the back sweep feature. The series 2 agent learned to use the back sweep feature and modified the other parameters to maintain the stall margin. Moreover, as mentioned by [43], the efficiency improved when the angle between the end wall and the suction surface was obtuse.

The operating characteristics at different step numbers are shown in Figure 33. The chock mass flow increased, and the pressure ratio remained higher than that of the reference rotor over the whole curve. The stall mass flow remained constant, showing that the penalty terms in reward worked well. The peak efficiency can be further improved by modifying more steps, shown as the 20-step case in Figure 33; however, the pressure ratio deteriorated as the number of steps increased further.

The performance at the 0.98 mass flow rate was significantly improved, with the total pressure ratio of

π

= 1.643 and efficiency of

η

= 0.9048 increasing to

π

= 1.681 and

η

= 0.9107 after 15 steps. The near-peak efficiency points are listed in Table 7. The peak efficiency improved after the geometry was modified.

4.3. Cooperation among Different Agents

The agents learned the design policy in the whole design space; hence, the different agents could cooperate before the designers fully understood the mechanism of the policy. This universality is an exciting component of the intellectual design process.

The series 1 and 2 agents were selected for the cooperation investigation. The reference rotor was modified by series 1 agents for ten steps and series 2 agents for another ten steps. The performance of the modified rotors was assessed after different steps by the CFD method, and the operating characteristic curves are plotted in Figure 34. The pressure ratio improvement induced by the series 1 agent remained, even after modification by the series 2 agent. Moreover, the peak efficiency improved rather than decreasing after 15 steps, as observed in Figure 17.

The cooperation policy performed better than single policies. After 20 steps in the cooperated modification process, the new rotor showed a higher

η

than the series 1 agent result (see Table 4), and a higher

π

than the series 2 agent results (see Table 7), with

π

= 1.678 and

η

= 0.9111 at the near-peak efficiency point. The 0.98 mass flow rate performance was

π

= 1.683 and

η

= 0.9105, which results are in between the performance of the two single agent results.

5. Conclusions

This study constructed a reinforcement learning framework using the modified DDPG algorithm to learn the design policies of transonic rotors. The RL agents learned similarly to humans and generated policies to improve the pressure ratio and efficiency, while maintaining the other performances. The flow field analyses of the improved rotors demonstrated the effectiveness of the learned policy. The cooperation of different agents showed an advantage over using the agents alone. Additionally, different agents were trained, and the rules have been summed up to guide further training. If the case is changed, most of the variables that need tuning have candidate values or have an objective to meet, which benefits the general applicability.

The modifications made the DDPG algorithm fit the usage of aerodynamic design better. The high-order feedback, virtual area, and artificial tip accelerated the training process. The presented error reduction process significantly improved the performance of the kriging models and reduced the number of required sample points. The trained agent modified the geometric parameters to improve the pressure ratio and reduce the loss in the tip region by changing

Δ β

and introducing sweep and lean features. As a result, the pressure ratio of the modified rotor improved by 1.01% with the same efficiency and flow rate, and can be further improved. Additionally, this work considered the cooperation of different agents, and found that this improved the

π

and

η

.

In summary, the present study is one of the first attempts to apply RL methods in 3D compressor design. RL methods are universal and more flexible than algorithms that find optimized points. Since the cooperation of agents has shown an advantage, a natural progression of this work is to integrate more prior knowledge and other design methods into the RL framework. The follow-up research could also explore the application of RL methods in more expansive design spaces and multistage cases after the improvements are incorporated.

The engine manufacturers possess plenty of simulation and experimental data, which may help to realize more potential abilities of RL methods. The trained agents can assist designers before they fully understand the fluid mechanism by improving factors such as pressure ratio and efficiency, or moving the working point, thus accelerating the iteration of products. Once agents’ abilities are improved further, they may outperform humans, and generate compressor designs automatically, thus outperforming humans in other fields.

Author Contributions

Conceptualization, X.X.; methodology, X.X.; software, X.X.; validation, X.X.; formal analysis, X.X. and D.B.; investigation, X.X.; resources, X.H. and M.Z.; data curation, X.X.; writing—original draft preparation, X.X. and D.B.; writing—review and editing, X.X., D.B. and X.H.; visualization, X.X. and D.B.; supervision, M.Z.; project administration, X.H.; funding acquisition, M.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

Nomenclature

$a_{t}$	action of the agent at step t
$s_{t}$	state of the environment at step t
$r_{t}$	reward in the environment at step t
$r_{a}$	radius direction of the rotor
$\hat{r}$	unidimensional radius direction
$χ_{i n}$	inlet camber angle
$χ_{o u t}$	outlet camber angle
$β_{y}$	incidence angle
$c_{0}$	reference chord for normalization
$θ_{0}$	reference interval for normalization
$\dot{m}$	mass flow
$π$	pressure ratio
$η$	efficiency
${\dot{m}}_{W}$ , $π_{W}$ , $η_{W}$	working point mass flow, pressure ratio and efficiency
$η_{t}$	efficiency threshold
${\dot{m}}_{l}, {\dot{m}}_{h}$	lowest and highest acceptable mass flow
$\hat{π}$ , $\hat{η}$	integral pressure ratio and efficiency
$ℙ$	set of performance variables
$p_{i}$	performance variable i
$s^{*}$	best recorded state
$s_{o}$	initial state of the selected model
$ℝ_{r}^{n}$	real design space with n dimensions
${\vec{n}}_{r a n d}$	random direction in the design state
$R_{c o n s t}$	deterministic radius in the design state
$ℝ_{v}^{n}$	virtual area with n dimensions
$s_{v}$ , $s_{r}$	states in the real design space and virtual area
$δ r_{v i r}$	correct reward term in the virtual area
$c_{V}$ , $d x$	strength and width of the virtual area
$δ r_{a r t}$	correct reward term
$d r_{0}$ , $δ_{s}$	strength and threshold of the artificial tip
$S$	whole design space
$r$	reward in the environment
$r_{r a w}$	raw reward determined by the performance
$a_{1}$ , $a_{2}$	constants to scale the reward
$δ r_{p_{i}}$	punishment term of performance constraint i
$δ r_{0, i}$ , $w_{i}$	constants to scale the ith constraint
$p_{c l, i}, p_{c u, i}$	reasonable range of the ith constraint
$ε_{d}$	consistency criterion
$N_{p}$	max number of steps
$ε_{c}$	smoothness criterion
$n_{w}$	number of states used for smoothness criteria
$a v e_{i}$	sum of actions used for smoothness criteria
$s_{e}$ , $p_{c}$	constants to define the demo case
$N_{e p}$	training episode number
$η_{s}$	training speed criteria
$θ_{k}$ , $P_{k}$	parameters of the kriging model
$θ_{m a x}$ , $p_{m i n}$	parameter limitations of the kriging model
$p_{i, m i n}, p_{i, m a x}$	error reduction range of the ith performance variable
$δ \hat{p_{i}}$ , $δ \hat{{p_{i}}^{'}}$	unidimensional exceeding term before and after error reduction
$a_{p}$	strength of the error reduction
$p_{i} ’$	performance after error reduction
$R^{2}$	coefficient of determination
${\hat{p}}_{i}$ , ${\bar{p}}_{i}$	predicted and average performance
$Δ β$	airfoil segment angle
$Δ α$	deviation of the absolute flow angle
$S M$	stall margin
$r_{D}$	reward range in the environment
${\dot{m}}_{s t a l l}$ , $π_{s t a l l}$	mass flow rate and pressure ratio of the near stall point

References

Ning, X.; Lovell, M.R. On the sliding friction characteristics of unidirectional continuous FRP composites. J. Tribol. 2002, 124, 5–13. [Google Scholar] [CrossRef]
Steffens, K. Advanced compressor technology—Key success factor for competitiveness in modern aero engines. In Proceedings of the 15th International Symposium on Air Breathing Engines (ISABE), Bangalore, India, 2–7 September 2001. [Google Scholar]
Biollo, R.; Benini, E. Recent advances in transonic axial compressor aerodynamics. Prog. Aerosp. Sci. 2013, 56, 1–18. [Google Scholar] [CrossRef]
Smith, L.H. Axial compressor aerodesign evolution at general electric. J. Turbomach. 2002, 124, 321–330. [Google Scholar] [CrossRef]
Horlock, J.H.; Denton, J.D. A review of some early design practice using computational fluid dynamics and a current perspective. J. Turbomach. 2005, 127, 5–13. [Google Scholar] [CrossRef]
Pinto, R.N.; Afzal, A.; D’Souza, L.V.; Ansari, Z.; Samee, A.D.M. Computational fluid dynamics in turbomachinery: A review of state of the art. Arch. Comput. Methods Eng. 2016, 24, 467–479. [Google Scholar] [CrossRef]
Dunham, J. CFD Validation for Propulsion System Components; AGARD: Gatineau, QC, Canada, 1998. [Google Scholar]
Jennions, I.K.; Turner, M.G. Three–dimensional navier—Stokes computations of transonic fan flow using an explicit flow solver and an implicit κ–ε solver. J. Turbomach. 1993, 115, 261–272. [Google Scholar] [CrossRef]
Samad, A. Turbomachinery Design; LAP LAMBERT Academic Publishing GmbH & Co.KG: Saarbrucken, Germany, 2012. [Google Scholar]
Kim, K.–Y.; Samad, A.; Benini, E. Design Optimization of Fluid Machinery: Applying Computational Fluid Dynamics and Numerical Optimization; John Wiley & Sons, Incorporated: Newark, NJ, USA, 2019. [Google Scholar]
Li, Z.; Zheng, X. Review of design optimization methods for turbomachinery aerodynamics. Prog. Aerosp. Sci. 2017, 93, 1–23. [Google Scholar] [CrossRef]
Bonaiuti, D.; Zangeneh, M. On the coupling of inverse design and optimization techniques for the multiobjective, multipoint design of turbomachinery blades. J. Turbomach. 2009, 131, 21014–21019. [Google Scholar] [CrossRef]
Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T. A fast and elitist multiobjective genetic algorithm: NSGA–II. IEEE Trans. Evol. Comput. 2002, 6, 182–197. [Google Scholar] [CrossRef]
Ma, S. –B.; Roh, M.–S.; Kim, K.–Y. Optimization of discrete cavities with guide vanes in a centrifugal compressor based on a comparative analysis of optimization techniques. Int. J. Aeronaut. Space Sci. 2021, 22, 514–530. [Google Scholar] [CrossRef]
Eberhart, R.; Kennedy, J. A new optimizer using particle swarm theory. In Proceedings of the Sixth International Symposium on Micro Machine and Human Science—MHS’95, Nagoya, Japan, 4–6 October 1995. [Google Scholar]
Mirjalili, S.; Lewis, A. The whale optimization algorithm. Adv. Eng. Softw. 2016, 95, 51–67. [Google Scholar] [CrossRef]
Huang, S.; Yang, C.; Han, G.; Zhao, S.; Lu, X. Multipoint design optimization for a controlled diffusion airfoil compressor cascade. Proc. Inst. Mech. Eng. C J. Mech. Eng. Sci. 2020, 234, 2143–2159. [Google Scholar] [CrossRef]
Tang, X.; Luo, J.; Liu, F. Aerodynamic shape optimization of a transonic fan by an adjoint–response surface method. Aerosp. Sci. Technol. 2017, 68, 26–36. [Google Scholar] [CrossRef]
Strazisar, A.J.; Wood, J.R.; Hathaway, M.D.; Suder, K.L. Laser Anemometer Measurements in a Transonic Axial—Flow Fan Rotor; NASA: Cleveland, OH, USA, 1989. [Google Scholar]
Du, Q.; Yang, L.; Li, L.; Liu, T.; Zhang, D.; Xie, Y. Aerodynamic design and optimization of blade end wall profile of turbomachinery based on series convolutional neural network. Energy 2022, 244, 122617. [Google Scholar] [CrossRef]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; The MIT Press: Cambridge, MA, USA; London, UK, 2020. [Google Scholar]
Tan, R.K.; Liu, Y.; Xie, L. Reinforcement learning for systems pharmacology–oriented and personalized drug design. Expert Opin. Drug Discov. 2022, 17, 849–863. [Google Scholar] [CrossRef]
Senior, A.W.; Evans, R.; Jumper, J.; Kirkpatrick, J.; Sifre, L.; Green, T.; Qin, C.; Žídek, A.; Nelson, A.W.R.; Bridgland, A.; et al. Improved protein structure prediction using potentials from deep learning. Nature 2020, 577, 706–710. [Google Scholar] [CrossRef] [PubMed]
Fawzi, A.; Balog, M.; Huang, A.; Hubert, T.; Romera–Paredes, B.; Barekatain, M.; Novikov, A.; Ruiz, F.J.R.; Schrittwieser, J.; Swirszcz, G.; et al. Discovering faster matrix multiplication algorithms with reinforcement learning. Nature 2022, 610, 47–53. [Google Scholar] [CrossRef] [PubMed]
Sigaud, O.; Stulp, F. Policy search in continuous action domains: An overview. Neural Netw. 2019, 113, 28–40. [Google Scholar] [CrossRef] [PubMed]
Viquerat, J.; Rabault, J.; Kuhnle, A.; Ghraieb, H.; Larcher, A.; Hachem, E. Direct shape optimization through deep reinforcement learning. J. Comput. Phys. 2021, 428, 110080. [Google Scholar] [CrossRef]
Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
Li, R.; Zhang, Y.; Chen, H. Learning the aerodynamic design of supercritical airfoils through deep reinforcement learning. AIAA J. 2021, 59, 3988–4001. [Google Scholar] [CrossRef]
Qin, S.; Wang, S.; Wang, L.; Wang, C.; Sun, G.; Zhong, Y. Multi–objective optimization of cascade blade profile based on reinforcement learning. Appl. Sci. 2020, 11, 106. [Google Scholar] [CrossRef]
Lillicrap, T.P.; Jonathan, J.H.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2016, arXiv:1509.02971. [Google Scholar] [CrossRef]
Li, J.; Zhang, M.; Martins, J.R.R.A.; Shu, C. Efficient aerodynamic shape optimization with deep–learning–based geometric filtering. AIAA J. 2020, 58, 4243–4259. [Google Scholar] [CrossRef]
Hammond, J.; Pepper, N.; Montomoli, F.; Michelassi, V. Machine learning methods in CFD for turbomachinery: A review. Int. J. Turbomach. Propuls. Power 2022, 7, 16. [Google Scholar] [CrossRef]
Hammond, J.; Pietropaoli, M.; Michelassi, V.; Sandberg, R.D.; Montomoli, F. Machine learning for the development of data–driven turbulence closures in coolant systems. J. Turbomach. 2022, 144, 081003. [Google Scholar] [CrossRef]
Wang, Z.Y.; Qu, F.; Wang, Y.H.; Luan, Y.G.; Wang, M. Research on the lean and swept optimization of a single stage axial compressor. Eng. Appl. Comput. Fluid Mech. 2021, 15, 142–163. [Google Scholar] [CrossRef]
Duan, Y.; Zheng, Q.; Jiang, B.; Lin, A.; Zhao, W. Implementation of Three-Dimensional Inverse Design and Its Application to Improve the Compressor Performance. Energies 2020, 13, 5378. [Google Scholar] [CrossRef]
Ren, X.; Gu, C. Investigation of Compressor Tip Clearance Flow Based on the Discontinuous Galerkin Methods. In Proceedings of the ASME Turbo Expo 2013: Turbine Technical Conference and Exposition, San Antonio, TX, USA, 3–7 June 2013. [Google Scholar] [CrossRef]
Jung, Y. –J.; Jeon, H.; Jung, Y.; Lee, K.–J.; Choi, M. Effects of recessed blade tips on stall margin in a transonic axial compressor. Aerosp. Sci. Technol. 2016, 54, 41–48. [Google Scholar] [CrossRef]
Chen, H.; Huang, X.; Fu, S. CFD Investigation on stall mechanisms and casing treatment of a transonic compressor. In Proceedings of the 42nd AIAA/ASME/SAE/ASEE Joint Propulsion Conference & Exhibit, American Institute of Aeronautics and Astronautics: Sacramento, CA, USA, 9–12 July 2006. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2015, arXiv:1412.6980. [Google Scholar]
Qie, H.; Shi, D.; Shen, T.; Xu, X.; Li, Y.; Wang, L. Joint optimization of multi–UAV target assignment and path planning based on multi—Agent reinforcement learning. IEEE Access 2019, 7, 146264–146272. [Google Scholar] [CrossRef]
Forrester, A.I.J. Engineering Design via Surrogate Modelling: A Practical Guide; Wiley: Chichester, UK, 2008. [Google Scholar]
Morris, M.D.; Mitchell, T.J. Exploratory designs for computational experiments. J. Stat. Plan. Inference 1995, 43, 381–402. [Google Scholar] [CrossRef]
Zheng, X.; Li, Z. Blade—End treatment to improve the performance of axial compressors: An overview. Prog. Aerosp. Sci. 2017, 88, 1–14. [Google Scholar] [CrossRef]
Denton, J.D. The Effects of Lean And Sweep on Transonic Fan Performance: A Computational Study. In Proceedings of the ASME Turbo Expo 2002: Power for Land, Sea, and Air, Amsterdam, The Netherlands, 3–6 June 2002. [Google Scholar] [CrossRef]
Denton, J.D.; Xu, L. The exploitation of three–dimensional flow in turbomachinery design. Proc. Inst. Mech. Eng. C J. Mech. Eng. Sci. 1998, 213, 125–137. [Google Scholar] [CrossRef]
Sasaki, T.; Breugelmans, F. Comparison of sweep and dihedral effects on compressor cascade performance. J. Turbomach. 1998, 120, 454–463. [Google Scholar] [CrossRef]
Shang, E.; Wang, Z.Q.; Su, J.X. The experimental investigations on the compressor cascades with leaned and curved blade. In Proceedings of Volume 1: Aircraft Engine; Marine; Turbomachinery; Microturbines and Small Turbomachinery; American Society of Mechanical Engineers: Cincinnati, OH, USA, 1993; p. V001T003A018. [Google Scholar]
Chima, R.V. Calculation of tip clearance effects in a transonic compressor rotor. J. Turbomach. 1998, 120, 131–140. [Google Scholar] [CrossRef]
Suder, K.L.; Celestina, M.L. Experimental and computational investigation of the tip clearance flow in a transonic axial compressor rotor. J. Turbomach. 1996, 118, 218–229. [Google Scholar] [CrossRef]
Hah, C.; Bergner, J.; Schiffer, H.–P. Short length–scale rotating stall inception in a transonic axial compressor: Criteria and mechanisms. In Proceedings of Volume 6: Turbomachinery, Parts A and B; ASMEDC: Barcelona, Spain, 2006; pp. 61–70. [Google Scholar]
Neshat, M.A.; Akhlaghi, M.; Fathi, A.; Khaledi, H. Investigating the effect of blade sweep and lean in one stage of an industrial gas turbine’s transonic compressor. Propuls. Power Res. 2015, 4, 221–229. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Overall structure of the intellectual design framework, which had a hierarchical structure. The CFD manager calculated the operating characteristics of the rotors in multiple threads, determined the performance and managed the data.

Figure 2. Comparison of the original and generated rotor on the

z - r_{a}

and

m - θ r_{a}

surfaces. The directions of the sweep and lean are marked.

Figure 2. Comparison of the original and generated rotor on the

z - r_{a}

and

m - θ r_{a}

surfaces. The directions of the sweep and lean are marked.

Figure 3. Definition of the camber angles and incidence.

Figure 4. Grid of the rotor with 1.2 M nodes and O–H topology.

Figure 5. Operating characteristic of NASA rotor 67, with 0.8 M, 1.2 M and 1.6 M grids. (a) The total pressure ratio curve. (b) The efficiency curve.

Figure 6. Variable distributions in the span in the experiment [19] and CFD simulations: (a) pressure distribution in the span; (b) temperature distribution in the span.

Figure 7. Typical operating characteristic curve according to the CFD calculations, including the definitions of 7 selected performance variables.

Figure 8. Distortion at the end of the process with a random initial state. The processes in high-dimensional design space were projected onto a dimensionless

{\hat{β}}_{y, 2}

–

{\hat{χ}}_{i n, 2}

surface. (a) Agent with mild distortion. (b) Agent with severe distortion.

Figure 8. Distortion at the end of the process with a random initial state. The processes in high-dimensional design space were projected onto a dimensionless

{\hat{β}}_{y, 2}

–

{\hat{χ}}_{i n, 2}

surface. (a) Agent with mild distortion. (b) Agent with severe distortion.

Figure 9. High-order feedback does not directly affect the influencing interactions between the agent and the environment.

Figure 10. Virtual area and penalty term;

δ r_{v i r}

is zero when the state is in the real design space.

Figure 10. Virtual area and penalty term;

δ r_{v i r}

is zero when the state is in the real design space.

Figure 11. Processes generated by normal and abnormal agents.

Figure 12. Training agents in the demo case. The average point and minimum

ε_{d}

value of each version were compared.

Figure 12. Training agents in the demo case. The average point and minimum

ε_{d}

value of each version were compared.

Figure 13. Schematic of the error reduction process. The overall performance of the model improved after restricting the points with poor performance.

Figure 14.

R^{2}

of

π_{W}

and

\hat{η}

with and without error reduction training for different numbers of sample points. (a)

R^{2}

of

π_{W}

. (b)

R^{2}

of

\hat{η}

.

Figure 14.

R^{2}

of

π_{W}

and

\hat{η}

with and without error reduction training for different numbers of sample points. (a)

R^{2}

of

π_{W}

. (b)

R^{2}

of

\hat{η}

.

Figure 15. Surrogate models with 7 (i = 1, 2, …, 7) selected performance variables trained in various configurations with different numbers of sample points N.

Figure 16. Modification of the design variables at different step numbers.

Figure 17. Variation in the rotor operating characteristic curves after different steps for the series 1 agent. (a) Flow–pressure ratio. (b) Flow–efficiency.

Figure 18. Variations in the geometric parameter distributions in the spanwise direction after modification by series 1 agents: (a) meridional coordinate; (b) tangential direction; (c) inlet camber angle; (d) outlet camber angle; (e) incidence angle.

Figure 19. Different geometries modified by the agents.

Figure 20. The deviations of the local performance distributions in different spans: (a) pressure ratio; (b) efficiency; (c) absolute flow angle.

Figure 21. Isentropic Mach number distribution through the chord line at 70% and 90% span. (a) 70% span. (b) 90% span.

Figure 22. Separation in the near-tip span region.

Figure 23. The static pressure distributions in the near-tip region.

Figure 24. Static pressure distributions through the chord line near and around the tip span. (a) 90% span. (b) 99.8% span.

Figure 25. Axial velocity distributions at the tip clearance span (99.8%).

Figure 26. Static pressure distributions in the tip clearance span region (99.8%).

Figure 27. Meridional velocity at 50% and 80% axial length.

Figure 28. The near-stall flow details; Vz and statistical pressure are plotted at the tip clearance span (99.8%).

Figure 29. Operation characteristic curves at 70% design speed. (a) Flow pressure ratio. (b) Flow efficiency.

Figure 30. Changes in R and

ε_{d}

when training series 2 agents. (a) Converge history of series 2. (b) Criteria history of series 2.

Figure 30. Changes in R and

ε_{d}

when training series 2 agents. (a) Converge history of series 2. (b) Criteria history of series 2.

Figure 31. The improvement with the modified algorithm.

Figure 32. Variations of the geometric parameter distributions in the spanwise direction after modification by series 2 agents: (a) meridional coordinate; (b) tangential direction; (c) inlet camber angle; (d) outlet camber angle; (e) incidence angle.

Figure 33. Variation in the operating characteristic curves of rotors after different numbers of steps after modification by series 2 agents. (a) Flow pressure ratio. (b) Flow efficiency.

Figure 34. Variation in the operating characteristic curves of the rotors after cooperation between series 1 and series 2 agents. (a) Flow pressure ratio. (b) Flow efficiency.

Table 1. Definitions and ranges of the design variables.

Variable	Definition	Ref	Min	Max	Variable	Definition	Ref	Min	Max
$d m_{1} / c_{0}$	$d m$ at $\hat{r} = 1 / 3$	0	−0.07	0.07	$χ_{i n, 4} / rad$	$χ_{i n}$ at $\hat{r} = 1$	0.050	−0.05	0.15
$d m_{2} / c_{0}$	$d m$ at $\hat{r} = 2 / 3$	0	−0.14	0.14	$χ_{o u t, 1} / rad$	$χ_{o u t}$ at $\hat{r} = 0$	−0.640	−0.67	−0.6
$d m_{3} / c_{0}$	$d m$ at $\hat{r} = 1$	0	−0.21	0.21	$χ_{o u t, 2} / rad$	$χ_{o u t}$ at $\hat{r} = 1 / 3$	−0.052	−0.08	−0.02
$d θ_{1} / θ_{0}$	$d θ$ at $\hat{r} = 1 / 3$	0	−0.03	0.03	$χ_{o u t, 3} / rad$	$χ_{o u t}$ at $\hat{r} = 2 / 3$	−0.050	−0.08	−0.018
$d θ_{2} / θ_{0}$	$d θ$ at $\hat{r} = 2 / 3$	0	−0.06	0.06	$χ_{o u t, 4} / rad$	$χ_{o u t}$ at $\hat{r} = 1$	−0.149	−0.18	−0.12
$d θ_{3} / θ_{0}$	$d θ$ at $\hat{r} = 1$	0	−0.09	0.09	$β_{y, 1} / rad$	$β_{y}$ at $\hat{r} = 0$	0.224	0.18	0.26
$χ_{i n, 1} / rad$	$χ_{i n}$ at $\hat{r} = 0$	0.448	0.41	0.48	$β_{y, 2} / rad$	$β_{y}$ at $\hat{r} = 1 / 3$	0.656	0.6	0.7
$χ_{i n, 2} / rad$	$χ_{i n}$ at $\hat{r} = 1 / 3$	0.123	0.1	0.15	$β_{y, 3} / rad$	$β_{y}$ at $\hat{r} = 2 / 3$	0.965	0.92	1
$χ_{i n, 3} / rad$	$χ_{i n}$ at $\hat{r} = 2 / 3$	0.064	−0.04	0.16	$β_{y, 4} / rad$	$β_{y}$ at $\hat{r} = 1$	1.099	1.05	1.15

Table 2. Specification of rotor 67 [19].

Parameter	Value	Parameter	Value
Rotational speed (rpm)	16,043	Relative tip (Mach)	1.3
Tip clearance (cm)	0.101	Mass flow rate (kg/s)	33.25
Number of blades	22	Pressure ratio	1.63

Table 3. Performances of different models.

$p_{i}$	$R^{2}$	$M A E$	$N$	$a_{p}$	$θ_{m a x}$	$P_{m i n}$
${\dot{m}}_{l}$	0.886	0.011	377	5	30	1.7
${\dot{m}}_{h}$	0.968	0.005	377	5	10	1.7
$\hat{η}$	0.938	0.001	377	5	30	1.7
$\hat{π}$	0.942	0.004	377	5	10	1.5
${\dot{m}}_{W}$	0.987	0.003	377	0	10	1.5
$π_{W}$	0.965	0.001	377	5	10	1.5
$η_{W}$	0.920	0.005	377	5	30	1.5

Table 4. Performances of modified rotors at different steps.

Step Number	${\dot{m}}_{W}$	$η_{W}$	$π_{W}$	$δ π_{W}$
0	33.21	0.909	1.679	0
5	33.35	0.910	1.683	0.24%
10	33.44	0.910	1.686	0.41%
15	33.21	0.909	1.696	1.01%

Table 5. Parameter influence on agent convergence.

$r_{r a w}$	Series	$a_{1}$	$a_{2}$	$r_{m i n}$	$r_{m a x}$	$r_{D}$	$δ r_{0, a v e}$	$N_{e p}$
$\hat{π} - {\hat{π}}_{0}$	1	9.57	0.72	−0.31	0.82	1.13	−0.068	~4000
$η_{W} - η_{W, 0}$	2	27	0.90	−0.18	1.01	1.19	−0.050	~3500
$\begin{array}{l} \hat{η} ({\dot{m}}_{h} - {\dot{m}}_{l}) \\ - {\hat{η}}_{0} ({\dot{m}}_{h, 0} - {\dot{m}}_{l, 0}) \end{array}$	3	38	0.90	–0.23	0.94	1.17	−0.050	~3000
	4	150	0.24	−2.91	0.95	3.86	−0.050	>14,000
$(\hat{η} - {\hat{η}}_{0}) ({\dot{m}}_{h} - {\dot{m}}_{l})$	5	500	1.00	−0.56	1.04	1.60	−0.050	~3000
$\frac{1.5}{100 \|{\dot{m}}_{W} - {\dot{m}}_{W, 0}\| + 1} η_{W}$	6	1.00	0.10	−0.14	1.12	1.26	−0.050	~4000

Table 6. Recommended values of parameters.

Parameter	Description	Recommended Value
$r_{m a x}$	Observed maximum $r_{r a w}$	1.0
$r_{m i n}$	Observed minimum $r_{r a w}$	−0.3
$δ r_{0, a v e}$	Strength of constrains	−0.05
$R_{t}$	Threshold of R	70
$ε_{d, t}$	Threshold of $ε_{d}$	0.25

Table 7. Performance after modification by series 2 agents at different numbers of steps.

Step Number	${\dot{m}}_{W}$	$η_{W}$	$π_{W}$	$δ η_{W}$
0	33.21	0.9088	1.679	0
5	33.74	0.9087	1.676	–0.00%
10	33.79	0.9106	1.675	0.20%
15	33.78	0.9115	1.674	0.30%
20	33.88	0.9125	1.666	0.41%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, X.; Huang, X.; Bi, D.; Zhou, M. An Intellectual Aerodynamic Design Method for Compressors Based on Deep Reinforcement Learning. Aerospace 2023, 10, 171. https://doi.org/10.3390/aerospace10020171

AMA Style

Xu X, Huang X, Bi D, Zhou M. An Intellectual Aerodynamic Design Method for Compressors Based on Deep Reinforcement Learning. Aerospace. 2023; 10(2):171. https://doi.org/10.3390/aerospace10020171

Chicago/Turabian Style

Xu, Xiaohan, Xudong Huang, Dianfang Bi, and Ming Zhou. 2023. "An Intellectual Aerodynamic Design Method for Compressors Based on Deep Reinforcement Learning" Aerospace 10, no. 2: 171. https://doi.org/10.3390/aerospace10020171

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Intellectual Aerodynamic Design Method for Compressors Based on Deep Reinforcement Learning

Abstract

1. Introduction

2. Materials and Methods

2.1. Rotor Parameterization

2.2. CFD Method

2.2.1. CFD Tools

2.2.2. Rotor 67 Simulation

2.2.3. Rotor Performance Specification

2.3. Modified DDPG Algorithm

2.3.1. Modifications to Improve the DDPG Algorithm

2.3.2. Environment Definition

2.3.3. Policy Evaluation

2.4. Error-Reduced Kriging Model

2.4.1. Kriging Model

2.4.2. Error Reduction

2.4.3. Surrogate Models for Rotors

3. Results

3.1. Policy for Improving the Pressure Ratio

3.2. Flow Field Analysis

3.3. Off–Design Conditions

3.3.1. Near Stall

3.3.2. Off-Design Speed

4. Discussion

4.1. Parameter Influence on the Training Process

4.2. Policy for Improving the Peak Efficiency

4.3. Cooperation among Different Agents

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI