Next Article in Journal
A Numerical Simulation of the Development Process of a Mesoscale Convection Complex Causing Severe Rainstorm in the Yangtze River Delta Region behind a Northward Moving Typhoon
Next Article in Special Issue
Ionospheric Sounding Based on Spaceborne PolSAR in P-Band
Previous Article in Journal
Exploring the Sensitivity of Visibility to PM2.5 Mass Concentration and Relative Humidity for Different Aerosol Types
Previous Article in Special Issue
Signal Simulation of Dual-Polarization Weather Radar and Its Application in Range Ambiguity Mitigation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Hybrid MPI/OpenMP Parallelization Scheme Based on Nested FDTD for Parametric Decay Instability

1
Optics and Optoelectronics Laboratory, Department of Physics, Ocean University of China, Qingdao 266100, China
2
National Key Laboratory of Electromagnetic Environment, China Research Institute of Radiowave Propagation, Qingdao 266107, China
3
College of Marine Geo-Sciences, Ocean University of China, Qingdao 266100, China
*
Author to whom correspondence should be addressed.
Atmosphere 2022, 13(3), 472; https://doi.org/10.3390/atmos13030472
Submission received: 8 February 2022 / Revised: 9 March 2022 / Accepted: 11 March 2022 / Published: 14 March 2022
(This article belongs to the Special Issue Radar Sensing Atmosphere: Modelling, Imaging and Prediction)

Abstract

:
Parametric decay instability (PDI) generated in milliseconds is an important physical phenomenon in ionospheric heating. Usually, numerical simulations are used to study PDI mechanisms. They can intuitively investigate the generation and development process of PDI, which is necessary in experimental studies. When simulating the PDI phenomenon through the explicit finite-difference time-domain (FDTD), the spatial scale spans from kilometers to centimeters, and the time scale needs to meet the Courant–Friedrichs–Lewy condition. Simulating the PDI phenomenon is time-consuming and difficult due to the high spatial resolution and strict restriction on the discrete time step. Although a nested mesh technique can boost the computational efficiency, the application of a parallel strategy is imperative to further improve it. In this study, we present a hybrid Message Passing Interface (MPI)/OpenMP parallelization scheme to solve the above-mentioned problems. This scheme can achieve an adaptive calculation and automatic allocation of MPI tasks and OpenMP threads, proving its flexibility and portability. Under the EISCAT background parameters, the PDI phenomenon was simulated. The results of the wave mode conversion and intense localized turbulence were identical to those of the serial program. Furthermore, a new simulation example and the effect of the cavity depth on electrostatic waves and negative ion density cavity were investigated. By utilizing the proposed parallelization scheme, the simulation time can be reduced from 70 h for the serial program to 3.6 h.

1. Introduction

Ionosphere heating experiments using ground-based high-power and high-frequency (HF) transmitters can cause ionospheric perturbations, leading to linear and nonlinear interactions between ionospheric plasma and waves [1,2,3]. The parametric decay instability (PDI) of millisecond excitation is the important physical mechanism of the plasma nonlinear instability, characterized by the nonlinear conversion of HF electromagnetic (EM) waves into Langmuir waves (approximately on the order of MHz) and ion-acoustic waves (~kHz) at the height of ordinary wave (O-wave) reflections [4,5]. Its excitation will further promote nonlinear development, making it possible to stimulate other instabilities in the heating region [5,6]. However, the excitation and development of PDI cannot be easily monitored using existing diagnostic facilities because of the limitation of time resolution (~s) and space resolution (~km). Numerical simulations can monitor the excitation process of PDI and further explore and analyze physical mechanisms, which can guide the design of experiments. The plasma density cavity is commonplace in the F region of the ionosphere. The cavity can be frequently observed by incoherent scattering radar, and the maximum depth of the cavity is about 30% of the background [7,8,9,10]. Due to the time-varying background density of the ionospheric plasma, a numerical simulation can directly monitor the interaction between EM waves and ionospheric plasma. Therefore, PDI simulations are of great significance for understanding the mechanism of ionospheric heating.
Numerical simulations of PDI mainly adopt magnetohydrodynamics theory to describe physical processes [11]. Owing to the PDI governing equations, including Maxwell’s equations, the finite-difference time-domain (FDTD) method is preferred in the numerical simulations of PDI. The classical FDTD method is a standard and mature technique for numerically solving Maxwell’s EM equations. The Yee structure of the FDTD method can separately define the physical properties of each grid-based structure and describe the number density, geomagnetic field, and other profile characteristics of different positions in the ionosphere [12]. However, electric and magnetic fields in the classic Yee structure only have temporal and spatial discretization, which is not sufficient to simulate PDI [13]. This structure does not have temporal and spatial discretization schemes for the ionospheric plasma number density and velocity. In the interaction model of plasma and EM waves, Young and Yu arranged the velocity field of the plasma current in the Lorentz equation and the electric field in the Maxwell equation at the same node [14,15]. Gondarenko et al. used the alternating direction implicit FDTD method to simulate the generation and evolution of density irregularities [16]. Hence, the FDTD method was developed to simulate PDI by expanding the discretization scheme of the physical characteristic parameters during interactions between ionospheric plasma and EM waves.
There are three different length scales in the PDI: ionospheric profiles (105 m), pump wave wavelengths (101 m), and electrostatic waves excited by wave mode conversion (10−1 m) [17,18,19]. When simulating PDI processes, the minimum scale of the discrete grid should be less than 10−1 m, and the time step needs to meet the Courant–Friedrichs–Lewy (CFL) condition [20]. Thus, the spatial grids will reach 106 points and the process will be up to 600 h when a 1D simulation is performed on a 200 km length area with a grid resolution of 2 decimeters (the discrete scheme is the same as the coarse mesh region in Figure 1b). Furthermore, to examine the general characteristics of the interaction between EM waves and the density cavity, which is suddenly generated, several simulations with different cavity depths need to be designed. If a serial code is used, then the total time cost of simulation becomes unacceptable.
Eliasson et al. [17] employed a nested nonuniform mesh to simulate the interaction between plasma and EM waves. The motion equations of the plasma were calculated on a fine mesh, and the Maxwell EM equation was solved on a coarse mesh. This calculation could be performed approximately 40 times faster on the nested grid than on the dense grid [17]. The update of the physical state variables through the explicit FDTD method is only related to the previous updating state. There is a fine-grain threading parallelism advantage since most grids are calculated in the same way at the same time. The coarse-grain parallelism is applied to offload the disk access since write tasks are unrelated to each other. Currently, there are many reports on parallel FDTD methods. Cannon et al. used a graphics processing unit technology to accelerate the FDTD [21]. In a study by Chaudhury, the hybrid programming of the message passing interface (MPI) and OpenMP was used to parallelize algorithms on integrated multicore processors and simulate the self-organized plasma pattern formation [22]. Yang et al. adopted an FDTD method with the MPI to solve large-scale plasma problems [23]. Nested meshes and the parallelization technique can be used in the simulation to improve the computational efficiency and precision.
In this paper, we present a hybrid MPI and OpenMP parallelization scheme based on the nested mesh FDTD for PDI studies. A nested mesh was used to reduce the number of meshes. OpenMP was employed to improve the efficiency of a single cyclical update. Simultaneously, to meet the requirements of the sampling frequency during the PDI simulation, the MPI was adopted to increase the storage efficiency and achieve the asynchronous storage. The above architecture can adaptively allocate MPI tasks and OpenMP threads and can be rapidly used with different hardware platforms. Based on the parallel scheme, several examples of adding the density cavity near the reflection point are simulated. Therefore, the dependence of the cavity depth on the electrostatic wave energy captured by the cavity is obtained.

2. Mathematical Model

2.1. Governing Equation

Plasma is composed of electrons and more than one type of ion, and it can be considered a conductive fluid. In the two-fluid model [11], Maxwell’s equations can be expressed by the following equations:
- μ 0 H t =   ×   E  
- ε 0 E t =   ×   H α N α q α U α  
where μ 0 is the vacuum permeability, ε 0 is the vacuum permittivity, E is the electric field, H is the magnetic field, N α is the number of particles per m3, and U α is the time-varying fluid bulk velocity vector. Here, α represents an electron or an oxygen ion. E , U α , H , and N α are the functions of the spatial coordinates (x, y, z) and time (T), respectively.
The continuity equation is expressed as
N α t = - ( N α U a )  
The equation of the electronic movement is expressed as
N e m e d U e d t = N e q e [ E + U e × B t ] - N e m e ν e U e ( k B N e T e )  
where k B is the Boltzmann constant, ν is the effective collision frequency, m is the quality, T is the temperature, and B t = B 0 + B represents the sum of the geomagnetic field and disturbance magnetic field. Here, the subscript e indicates electrons. It should be noted that the d U α / d t U α / t + ( U α · ) U α is the convective derivative, in which ( U α · ) is a scalar differential operator, and it can be ignored in this physical scenario.
The equation of the ionic movement is expressed as
N i m i d U i d t   = N i q i [ E + U i × B t ] N i m i v i U i ( k B N i T i ) N i q i 2 4 m e ω 0 2 | E |
The subscript i indicates the ion. The last one is the ponderomotive force, i.e., the low-frequency force generated by the HF field acting on the particle. ω 0 is the frequency of the HF EM wave, | E | is the electric field of the HF EM wave, and q i is the amount of electric charge.
To simulate PDI processes on the order of milliseconds, the electron temperature can be regarded as a constant value; the Maxwell Equations (1) and (2), continuity Equation (3), electron motion Equation (4), and ion motion Equation (5) are used to simulate the nonlinear process through the coupling of the ponderomotive force and low-frequency density disturbance. Moreover, the model does not involve temperature update.

2.2. Discretization Scheme

The FDTD method, first proposed by Yee et al., is a numerical simulation method used for directly solving Maxwell’s differential equations in the time domain for EM fields. The model space created by Equations (1)–(5) is discretized using a Yee cell, with a leap-frogging format on the time-domain recursive scheme and an update cycle of H n + 1 2 U n + 1 2 N n + 1 2 E n + 1 H n + 3 2 .
We assume that a vertical stratified ion number density profile is defined as N i 0 ( z ) and B 0 of the constant geomagnetic field toward the density gradient obliquely. The EM wave is injected vertically into the ionosphere and has a spatial variation only in the z-direction, which is presented as x = y = 0 . In Figure 1a is shown a schematic of EM waves interacting with the plasma. In Figure 1b, the horizontal coordinate is the spatial axis in the z-direction, and the vertical coordinate is the time axis.
On the coarse mesh region, H x and H y are located at ( j + 1 2 , n + 1 2 ) ; E x , E y , and E z are located at ( j ,   n ) ; and U α x , U α y , U α z , and N are located at ( j ,   n + 1 2 ) , where j and n are the integer coordinates. Δ z is the discrete spatial step, and Δ t is the time step. The H x nodes were calculated as n Δ z in space and n Δ t in time. On the fine mesh region, the variables have similar descriptions as in the coarse mesh region. However, it is important to note the spatial and time step of the fine mesh region.
On the coarse mesh region, the H update equation is
{ H x n + 1 2 ( j + 1 2 ) H y n + 1 2 ( j + 1 2 ) } = { H x n - 1 2 ( j + 1 2 ) H y n - 1 2 ( j + 1 2 ) } + μ 0 d t d z { E y n ( j + 1 ) - E y n ( j ) E x n ( j ) - E x n ( j + 1 ) }  
The complete set of the update naturally lends itself to a leapfrog time-stepping scheme, following the cyclical update pattern. The necessary processes, such as source injections or boundary conditions, were set to the cycle at appropriate points to complete the numerical simulation.

3. Serial Program

3.1. Serial Program Design

The serial program scheme was designed based on the numerical discrete scheme of the control equations, as shown in Figure 2. The initial values of the physical parameters were obtained after program initialization. The space step should be less than 10−1 m, and the time step needs to meet the CFL condition. The total number of updates was determined by the time step. At the same cycle time, the individual spatial grids were updated for each physical variable ( H , U , N , and E ) in the data update module (module 1), and the storage operation for the updated results was performed in the data storage module (module 2).
The storage operation was conditionally performed to reduce the wall-clock consumption and hardware resources on the basis of meeting the sampling period necessary for data processing so as not to store the results of the update in each cycle.
Considering the physical decay process of EM waves into Langmuir waves and ion acoustic waves after the nonlinear interaction with the plasma, the spectrum resolution obtained from the Fourier transform of the data must be less than the frequency of the ion acoustic wave (kHz), and the maximum frequency should be higher than the main frequency of the pump wave.
The frequency resolution of the discrete Fourier transform was determined by Equation (7), and the maximum frequency was determined by Equation (8):
d = 1 N Δ  
f max = 1 2 Δ  
In Equation (7), N represents the number of samples and Δ is the sampling period. The frequency resolution is related to the total sampling time and is represented by N Δ . The longer the total sampling time, the higher the resolution of the spectrum. According to the Nyquist sampling theorem [24], when sampling at period Δ , the highest frequency that can be resolved is only 1/2 of the sampling frequency. As a result, the sampling period should be determined by Equations (7) and (8) to meet the resolution of the ion acoustic wave (kHz) and discriminate the main frequency of the pump wave.

3.2. Serial Program Performance Analysis

The program contains an initialization module, a data update module, and a data storage module. Each sampling period contained multiple data update operations and one data storage operation. The performance of the serial program on a workstation was tested with an Intel CORE I7 9700 K CPU, as shown in Table 1. The second and third columns show the wall-clock time consumption for modules 1 and 2 in the single cyclical update for different numbers of grids, the fourth column shows the sampling period determined by Equations (7) and (8) for different grids, the fifth column shows the total time consumption in a sampling period for different grids, and the last column shows the ratio of the storage time consumption to the total time consumption. Here, function of omp_get_wtime() is used to compute wall-clock times.
As the grid increased, the time consumed by the single cyclical update of the computation module, total number of cyclical updates, and total time consumption increased exponentially. As a result, the time consumption was large even though the simulation was on the order of only milliseconds of evolutionary or physical time. In addition, the sampling period increased accordingly as the time interval decreased. However, even in long sampling periods, the storage time accounted for a large percentage of the total time consumed.
In a serial program scheme, the update of each physical variable space in each grid can only be performed sequentially even if the calculation of each grid is unaffected by other grids in the same cyclical update. The data update operation of the next sample period must wait for the data storage operation of the previous sample period to be completed even though there is clear task parallelism. To improve execution efficiency and reduce time consumption, it is essential to parallelize serial programs.

4. MPI and OpenMP Parallelization of FDTD

Modules 1 and 2 were executed alternately, as simplified in the first panel of Figure 3 based on the analysis of the serial program performance in the previous section. Module 1 updates the next period according to the results of the previous period, and module 2 stores the results of the update operation that meet the sampling period.
The two modules were parallelized based on their characteristics. Modules 1 and 2 were linked in terms of task sequencing but did not affect each other in terms of data update dependency. That is, after Task 0 carries out an update, it can off-load the data to another MPI task for storage and proceed immediately to another update step. Therefore, the two modules could be processed in parallel according to the timing. Because MPI is easy to operate on the storage hierarchy [25,26], the MPI parallel framework was designed on the basis of the serial program to parallelize modules 1 and 2, as shown in the second panel of Figure 3. In this framework, multiple MPI tasks were created. Task 0 is responsible for the data update operation and data forwarding operation. The latter tasks aim to forward the updated results that satisfy the sampling period for other tasks. Task 1–Task N are responsible for the data receiving operation and data storage operation.
For module 1, such as the discrete scheme shown in Equation (6), the updates of the values of each component are spatially independent of one another. Thus, this module can be parallelized to further improve the running efficiency of the program, although the time required for the single cyclical update of the data update operation is much less than the time required for the storage operation.
In the FDTD discrete scheme, the updated results of the previous update must be stored in the shared memory and used for the next update. The OpenMP programming model uses shared memory to achieve parallelism by dividing a process task into several threads [27,28].
Although the OpenMP programming model can only be used within a single compute node, the effectiveness of the threading is limited by the hardware in this node. In this manuscript, the spatial grid size is on the order of 105–106 points, and OpenMP is more suitable for parallelizing this workload than MPI or CUDA since they tend to handle much larger spatial grids [29]. OpenMP could avoid some extra parallelism overhead, for instance, the required message passing brought by MPI and CUDA. Based on the above discussion, a parallel program framework for the hybrid programming of OpenMP and MPI was established for the full model, as shown in the third panel of Figure 3.

4.1. Data Update Module Parallel Scheme

As shown in Figure 4, the update of module 1 (update of H, U, N, and E), for which Task 0 is responsible, is parallelized in a thread derived from OpenMP in Task 0. The grid should be assigned to the threads opened by OpenMP using the block decomposition method, which can be used to improve the cache hit rate and increase the efficiency of the program [30].
Table 2 lists the single iteration times of module 1 when the grid sizes are 35,000, 350,000, and 2,000,000 with different numbers of threads. It is worth noting that the wall-clock time counted here contains only the FDTD code used for variable updates, and the storage code is not included. When the number of threads is 1, it refers to a serial program. The time consumption for the numerical computation significantly decreased with an increase in the number of threads. In particular, the time consumption increased when the grid size was 35,000 with four threads. This is due to the fact that the parallel overhead was greater than the time reduced by parallelism.

4.2. Data Storage Module Parallel Scheme

The MPI programming model consists of a set of standard interfaces for execution in a heterogeneous (networked) environment. Task 0 contains the threading applied to the mathematical model and operation of periodically offloading data for storage. In the mathematical model, the calculation on the next period will rely on the results of the previous period. Once the variables stored in shared memory are tampered with by another operation, it will see incorrect execution of simulation. Here, the MPI programming model plays an important role for offloading data. It guarantees the correct operation of the program.
As shown in Figure 5, in Task 0, the data that should be stored are assigned to a temporary array and forwarded to the corresponding task. Then, the data computation task of the next sample period is continued after completing the forwarding operation. Tasks 1–N sequentially receive the data from Task 0. There is a unique buffer for each of the storage tasks to execute the storage operation. The offloading Tasks 1–N are used in a round-robin procedure. These tasks will continue receiving data forwarded by Task 0 after completing the storage operation until the loop ends.
To avoid task blocking and for a smooth execution of Task 0, the total number of tasks should be reasonably allocated. For example, if the first three sampling periods forward the updated results to Tasks 1, 2, and 3, respectively, then the results from the fourth sampling period should be forwarded to Task 1 later than the moment when Task 1 finishes the previous storage operation. Otherwise, the program opens and forwards to Task 4 to avoid blocking by Task 0 until Task 1 finishes the storage operation.
In the MPI parallel model, the data storage operation of the serial program is replaced by the data forwarding operation of Task 0. With grid sizes of 35,000, 350,000, and 2,000,000, Table 3 lists the average time consumption of data forwarding operations between different tasks for 1000 sampling periods under different MPI task numbers and the sample period, as shown in Table 1. The second column (T1) is the average time spent on 1000 sample periods of storage operations in the serial program. The last column (T2) is the time spent on data forwarding operations in an ideal state without task blocking.
As the number of MPI tasks increased, the time consumed for data forwarding operations significantly decreased because the code avoided task blocking, as shown in Table 3. In addition, only a small number of tasks were needed to make the data forwarding operation’s time consumption close to the ideal state when the grid numbers were 350,000 and 200,000. This is because the sample period increases as the number of grids increases, and the ratio of the time spent on data storage to the total time consumption decreases accordingly, as shown in Table 1. Hence, the number of MPI tasks required to perform storage operations can be decreased.

4.3. Adaptive Allocation of the Number of Threads and Tasks

The parallelism scheme in this study is based on the hybrid OpenMP and MPI programming, and the different parallelism techniques require opening their own threads and tasks. The overall situation should be considered when allocating the number of OpenMP threads and MPI tasks reasonably to effectively use computer hardware instead of just considering the optimal solution on the current module. The automatic allocation of the number of threads and tasks through adaptive computing allows programs to be quickly adapted to different hardware devices and greatly reduces the debugging time.
The adaptive calculation includes the following steps: first, the MPI timing function is used to count the time of each module in the serial program, and sampling periods are repeated a thousand times to obtain the average time. The time of each module includes the computation time (U_T) that must be performed sequentially, computation time (P_T) that can be performed in parallel, and storage time consumption (S_T). Second, the memory (U_M) occupied by each variable of module 1 and memory (S_M) occupied by module 2 are counted. Then, the parallel overhead time (C_T) used in communication operations is counted. Finally, a hardware query operation is performed to obtain the hardware information of the current computer platform: memory size (G_M) and CPU number (N). According to the following principles, the number of tasks (P) opened by MPI and the number of threads (T) opened by OpenMP are allocated adaptively:
T/P> ((U_T + C_T) × T+P_T) × 1.2/S_T
T + P < N × 2
U_M + S_M × P < G_M
In Table 4, (1) the first column represents a code for the adaptive assignment of tasks and threads, and (2) represents a code for the manual assignment of threads and tasks. When the grid size is 100,000, the speedup obtained by code (1) and the best speedup obtained by code (2) were counted on CPU i7 9700 K and CPU XEON GLOD 6254, respectively. The speedup obtained by code (1) is very close to that of code (2). The adaptive allocation of the number of tasks and threads not only ensures a better speedup but also improves the portability of programs.

5. Results

As a numerical example, the altitude of the simulation starts from 190 to 340 km. The coarse and dense grids in the nested grid resolutions were set to 10 and 0.2 m, respectively. The dense grid region was in the range of 265.9 to 267.7 km, which is the O-wave reflection region. We assumed that the starting time of the transmitter was 0 ms. The simulation time was from t = 0.633   ms to   t = 3   ms ; 0.663 ms represents the time required for the EM wave to reach the altitude of the wave source, which was set to 190 km. According to the CFL condition, the numerical time step was 1.67 × 10 8 s. The sampling period was set to three, which means the solution was stored every three time steps.
The geomagnetic field was set to 4.8   ×   10 5 T and was tilted θ = 12° to the vertical (z) axis, so B 0 = 4.8 × 10 5 [ x sin ( θ ) z cos ( θ ) ] and x and z are the unit vectors in the x- and z-directions. It is in accordance with the European Incoherent SCATter Scientific Association (EISCAT) background. The temperature of the electrons and ions was 1500 K. The electron and ion collision frequencies were 2500 and 2000, respectively. The ion number density was given by N i 0   = N max e x p [ - ( z - 3 × 10 5 ) 2 / ( 1 . 6 × 10 9 ) ] (z in meters) with N max = 6 × 10 11 m - 3 . The pump wave at frequency was f 0 = 5   MHz modulated as E x ( t ) = 1 . 5 × sin ( 2 π f 0 t ) , so the heating simulation is underdense.
Here, the different grid resolution was selected to test the accuracy of the solution. Other background conditions remain unchanged. Uniform fine and coarse grids are used for comparison with nested grids. The resolutions of the coarse grids were set to 10 m, and the resolutions of fine grids were set to 0.2 m. The absolute value of the vertical electric field |E_Z| in the altitude range of 266.85 to 267 km at 1.5194 ms was extracted, respectively, which can be shown in Figure 6. It can be clearly seen that the values of the nested grid match the values of the uniform fine grid very well. The phenomenon of electrostatic waves excited by wave mode conversion is completely invisible due to the resolution of the coarse grid.
Figure 7 shows the variation in the vertical electric field E z with time approximately 1 km near the O-wave reflection point. After 1.853 ms, a strong perturbation phenomenon was generated, the previous standing wave structure gradually disappeared, and the intense localized turbulence occurred. The perturbation region was filled with small-scale electrostatic fields of the Langmuir waves. The amplitude of the Langmuir waves was up to 50 V/m, which is greater than the maximum 20 V/m of the vertical electric field when the standing wave structure was established. Nonlinear effects are responsible for this phenomenon. High frequency plasma waves excited by PDI generate the ponderomotive force on the electron, which causes the plasma density to redistribute. During this process, ions travel with electrons to remain electrically neutral. The low frequency density disturbance of ions and electrons can form a positive feedback process with nonlinear collapse of electric field. In addition, as time goes by, the spatial range of the disturbance gradually expands, and the region located at a lower altitude also begins to show strong disturbance phenomena.
The matching condition of the frequency and wave numbers in PDI is given by
ω 0 = ω L + ω I - A , k 0 = k L + k I - A  
where ω 0 represents the angular frequency of the pump wave, ω L represents the angular frequency of Langmuir waves, and ω I - A represents the angular frequency of ion acoustic waves. Similarly, k represents the wave number vector of each wave, and the subscript of k has a similar meaning to ω [31].
The dispersion relation of Langmuir wave in magnetized plasma is
ω L 2 = ω p e 2 + 3 ν T e 2 k L 2 + ω L 2 ω c e 2 sin 2 θ ω L 2 - ω c e 2 cos 2 θ  
where ν Te   = T e × k B / m e , ω p e = n 0 q 2 ε 0 m e represents the plasma frequency, and ω c e = q B 0 m e is the electronics cyclotron frequency [20].
The dispersion relation of ion acoustic waves is represented as
ω I - A 2 = k I - A 2 + k B T e + 3 k B T i m i  
The frequency of the ion acoustic wave ω I - A ( kHz ) is far smaller than that of the Langmuir wave ω L ( MHz ) . According to the matching condition of the frequency, the frequency of the pump wave and Langmuir wave were used in approximate calculations: ω 0 ω L ω I - A .
It is assumed that the Langmuir wave is taken as the O-wave frequency ω L / 2 π ω 0 / 2 π 5   MHz . Bringing the frequency of the Langmuir wave into the dispersion relation of the Langmuir wave, the wave number of the Langmuir wave was approximately 3 4   rad   m - 1 , which was extracted at 266.9032 km, as shown in Figure 8a.
In ionospheric plasma, the propagation velocity of the radio wave is much higher than that of the plasma electrostatic wave, and the wave number of the radio wave is much less than that of the electrostatic wave, i.e., | k L | | k 0 | = n ω 0 / c 0 (the refractive index n becomes very small when the O-wave approaches the reflection point). According to the matching condition of wave numbers, k I - A k L . Bringing the wave number into the dispersion relation of ion acoustic waves, the frequency of ion acoustic waves was approximately 5–7 kHz. In this study, the E z value at 266.9032 km was extracted, and Fourier transform was performed, as shown in Figure 8b. Three peaks existed near the frequency of 5 MHz. The middle peak was at 5 MHz, implying a pump wave. The frequency differences of the left and right peaks were 7 and 4 kHz, respectively, implying ion acoustic waves. Compared to the theoretical value, a gap occurred, which can represent the systematic error introduced by the model algorithm.
Figure 9 shows the absolute value of the vertical electric field | E z | and the ion density perturbation Δ N i in the altitude range of 266.85 to 267 km at 1.5194 ms. Here, Δ N i = N i _ now - N i _ orgin , N i _ now represents the instantaneous value of ion density at the current moment, and N i _ orgin represents the original value of ion density. The vertical electric field presents a small-scale Langmuir electrostatic wave structure via the PDI rather than a standing wave structure. Each negative ion density cavity corresponds to a small-scale electrostatic field. This finding means that the ion density cavity captures the Langmuir wave turbulence. The results shown in Figure 7, Figure 8, and Figure 9 prove the correctness of the numerical simulation.
The omp_get_wtime() function was used to count the execution time of the serial program and parallel program adopting the above-mentioned example, respectively, as shown in Figure 10. The vertical axis represents the speedup of the program. By utilizing the proposed parallelization scheme, the simulation time can be reduced from 70 h with the single-threaded C program to 3.6 h when using 25 MPI tasks and 6 OpenMP threads.
Based on the EISCAT background conditions mentioned in this section, the ion density cavity was added near the reflection point. It is given by N i ( z ) = N i 0 ( z ) + N cavity ( z ) (z in meters), where N cavity ( z ) = - A × N max exp [ - ( z     266 , 900 ) 2 / ( 8100 ) ] , A is the cavity depth, and 266,900 represents the reflection point. The cavity characteristic scale is 8100 , which can be applied to investigate the coupling between electrostatic waves and EM waves on a small scale.
As shown in Figure 11a, an ion density profile exists with a cavity depth of 15% near the reflection point. The absolute values of the vertical electric field | E z | and ion density perturbation Δ N i are shown in Figure 11b,d, respectively, in the altitude range of 266.85 to 267 km at 1.5194 ms. For comparison, another case without an ion cavity, i.e., N cavity ( z ) = 0 , is shown in Figure 11c,e. The depth of the negative ion density cavity is related with the peak of the electrostatic field. Particularly, when a cavity was added near the reflection point, the depth of the negative ion density cavity and the peak of the electrostatic waves was greater than that in the case without a cavity.
The spatial Fourier transform of the value of | E z | near the reflection point was performed at 1.5194 ms. Figure 12a shows the power spectral density. In this figure, the black line represents that the spatial power spectral density at the cavity depth is 0, and the orange line represents that the spatial power spectral density at the cavity depth is 0.15. Clearly, the power is concentrated around a certain wave number, and the peak values are different when the cavity depths are different.
The cavity structure was changed by adjusting the characteristic scale and cavity depth. The influence of the structure on the spatial power spectral density was studied using the control variable method. When other background conditions remain unchanged, the most important factor affecting spatial power spectral density is the cavity depth, and the change in the feature scale has fewer effects on it. As shown in Figure 12b, the spatial power spectral density peaks at different times and cavity depths were counted. With the increase in the cavity depth, the spatial power spectral density peak also increased significantly. This is because, under the influence of resonance, the electrostatic wave transformed by the EM wave was captured by the cavity. With the increase in the cavity depth, more energy was captured in the cavity.

6. Conclusions

In this paper, we presented a hybrid MPI and OpenMP parallelization scheme based on the nested mesh FDTD to study the PDI. By examining the controlling equations of the PDI, the physical parameters were discretized, and a numerical simulation method based on the nested grid FDTD was established. In view of the natural parallelism advantage of the explicit FDTD method, we designed a hybrid MPI and OpenMP parallel architecture to accelerate it using MPI for asynchronous storage to improve the storage efficiency and OpenMP for the parallel acceleration of the data update module. Through adaptive computing, MPI tasks and OpenMP threads were allocated automatically to adapt to a variety of hardware environments. We used the EISCAT background parameters as an example to prove the validity of the scheme. Clear electrostatic wave filamentation was produced near the reflection height, and the cavity turbulence of localized strong Langmuir waves was observed. The ion acoustic and pump waves generated by the nonlinear PDI effect can be distinguished from the spectrogram. The results match well with the results of the serial program and have a speedup of up to 20 times. In addition, we monitored the interaction between electrostatic waves and suddenly generated cavity in the ionosphere and the influence of the cavity depth on the electrostatic wave energy captured by the cavity. Moreover, the parallel architecture can be easily used in two-dimensional (2D) simulations. In a 2D model, the impact of the incident angle of a pump wave beam can be easily studied, and the period and size of ionospheric irregularities generated by heating can also be determined. The parallel architecture mentioned in this paper will be highly efficient and significant in 2D simulations as the number of grids increases quadratically. Significantly, the disk-write times are assumed as a fixed amount of wall-clock time because of the presence of less data in a 1D simulation. However, as the volume of data explodes in a 2D simulation, the disk-write times are different, which cannot be ignored. Addressing the variable disk-write times well can further optimize efficiency when using this parallel architecture to solve 2D problems.

Author Contributions

Conceptualization, L.H., J.C. and J.L.; software, L.H., J.C. and J.L.; writing—original draft preparation, L.H., J.C. and J.L.; writing—review and editing, G.Y., S.H., Y.Y., J.Y. and Q.L.; supervision, G.Y., Q.L. and Y.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the China Postdoctoral Science Foundation (Grant No. 2020M672307), the Foundation of the National Key Laboratory of the Electromagnetic Environment of China Electronics Technology Group Corporation (Grant No. 201803001), the Foundation of the National Key Laboratory of the Electromagnetic Environment of China Electronics Technology Group Corporation (Grant No. 202003010), the Foundation of the National Key Laboratory of the Electromagnetic Environment of China Electronics Technology Group Corporation (Grant No. 202003011), the National Natural Science Foundation of China (Grant No. 42004055), the foundation of National Key Laboratory of Electromagnetic Environment (Grant No. 6142403200310).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No data were used to support this study.

Acknowledgments

Discussions with associate Ying Liu are gratefully appreciated.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Pedersen, T.; Gustavsson, B.; Mishin, E.; Kendall, E.; Mills, T.; Carlson, H.C.; Snyder, A.L. Creation of artificial ionospheric layers using high-power HF waves. Geophys. Res. Lett. 2010, 37, L02106. [Google Scholar] [CrossRef] [Green Version]
  2. Rietveld, M.T.; Kohl, H.; Kopka, H.; Stubbe, P. Introduction to ionospheric heating at Tromsø—I. Experimental overview. J. Atmos. Terr. Phys. 1993, 55, 577–599. [Google Scholar] [CrossRef]
  3. Rietveld, M.T.; Senior, A.; Markkanen, J.; Westman, A. New capabilities of the upgraded EISCAT high-power HF facility. Radio Sci. 2016, 51, 1533–1546. [Google Scholar] [CrossRef] [Green Version]
  4. Streltsov, A.V.; Berthelier, J.J.; Chernyshov, A.A.; Frolov, V.L.; Honary, F.; Kosch, M.J.; McCoy, R.P.; Mishin, E.V.; Rietveld, M.T. Past, present and future of active radio frequency experiments in space. Space Sci. Rev. 2018, 214, 118. [Google Scholar] [CrossRef] [Green Version]
  5. Zhou, C.; Wang, X.; Liu, M.; Ni, B.; Zhao, Z. Nonlinear processes in ionosphere: Report on the mechanisms of ionospheric heating by the powerful radio waves. Chin. J. Geophys. 2018, 61, 4323–4336. [Google Scholar] [CrossRef]
  6. Hocke, K.; Liu, H.X.; Pedatella, N.; Ma, G.Y. Global sounding of F region irregularities by COSMIC during a geomagnetic storm. Ann. Geophys. 2019, 37, 235–242. [Google Scholar] [CrossRef] [Green Version]
  7. Gurevich, A.V.J.P.-U. Nonlinear effects in the ionosphere. Phys. Usp. 2007, 50, 1091. [Google Scholar] [CrossRef]
  8. Doe, R.A.; Mendillo, M.; Vickrey, J.F.; Zanetti, L.J.; Eastes, R.W. Observations of nightside auroral cavities. J. Geophys. Res. Space Phys. 1993, 98, 293–310. [Google Scholar] [CrossRef] [Green Version]
  9. Streltsov, A.V.; Lotko, W. Coupling between density structures, electromagnetic waves and ionospheric feedback in the auroral zone. J. Geophys. Res. Space Phys. 2008, 113, A05212. [Google Scholar] [CrossRef] [Green Version]
  10. Zettergren, M.; Lynch, K.; Hampton, D.; Nicolls, M.; Wright, B.; Conde, M.; Moen, J.; Lessard, M.; Miceli, R.; Powell, S. Auroral ionospheric F region density cavity formation and evolution: MICA campaign results. J. Geophys. Res. Space Phys. 2014, 119, 3162–3178. [Google Scholar] [CrossRef] [Green Version]
  11. Robinson, T.R. The heating of the high lattitude ionosphere by high power radio waves. Phy. Rep. 1989, 179, 79–209. [Google Scholar] [CrossRef]
  12. Yee, K. Numerical solution of initial boundary value problems involving Maxwell’s equations in isotropic media. IEEE Trans. Antennas Propag. 1966, 14, 302–307. [Google Scholar] [CrossRef] [Green Version]
  13. Simpson, J.J. Current and future applications of 3-D global Earth-Ionosphere models based on the Full-Vector Maxwell’s equations FDTD method. Surv. Geophys. 2009, 30, 105–130. [Google Scholar] [CrossRef]
  14. Young, J.L. A full finite difference time domain implementation for radio wave propagation in a plasma. Radio Sci. 1994, 29, 1513–1522. [Google Scholar] [CrossRef]
  15. Yu, Y.; Simpson, J.J. An E-J Collocated 3-D FDTD model of electromagnetic wave propagation in magnetized cold plasma. IEEE Trans. Antennas Propag. 2010, 58, 469–478. [Google Scholar]
  16. Gondarenko, N.A.; Ossakow, S.L.; Milikh, G.M. Generation and evolution of density irregularities due to self-focusing in ionospheric modifications. J. Geophys. Res. Space Phys. 2005, 110, A09304. [Google Scholar] [CrossRef] [Green Version]
  17. Eliasson, B. Full-scale simulation study of the generation of topside ionospheric turbulence using a generalized Zakharov model. Geophys. Res. Lett. 2008, 35, L11104. [Google Scholar] [CrossRef]
  18. Eliasson, B. A nonuniform nested grid method for simulations of RF induced ionospheric turbulence. Comput. Phys. Commun. 2008, 178, 8–14. [Google Scholar] [CrossRef]
  19. Eliasson, B. Full-scale simulations of Ionospheric Langmuir turbulence. Mod. Phys. Lett. B 2013, 27, 1330005. [Google Scholar] [CrossRef]
  20. Huang, J.; Zhou, C.; Liu, M.-R.; Wang, X.; Zhang, Y.-N.; Zhao, Z.-Y. Study of parametric decay instability in ionospheric heating of powerful waves (I): Numerical simulation. Chin. J. Geophys. 2017, 60, 3693–3706. [Google Scholar] [CrossRef]
  21. Cannon, P.D.; Honary, F. A GPU-Accelerated finite-gifference time-domain scheme for electromagnetic wave interaction with plasma. IEEE Trans. Antennas Propag. 2015, 63, 3042–3054. [Google Scholar] [CrossRef] [Green Version]
  22. Chaudhury, B.; Gupta, A.; Shah, H.; Bhadani, S. Accelerated simulation of microwave breakdown in gases on Xeon Phi based cluster-application to self-organized plasma pattern formation. Comput. Phys. Commun. 2018, 229, 20–35. [Google Scholar] [CrossRef]
  23. Yang, Q.; Wei, B.; Li, L.; Ge, D. Analysis of the calculation of a plasma sheath using the parallel SO-DGTD method. Int. J. Antennas Propag. 2019, 2019, 7160913. [Google Scholar] [CrossRef]
  24. Sharma, K.K.; Joshi, S.D.; Sharma, S. Advances in Shannon sampling theory. Def. Sci. J. 2013, 63, 41–45. [Google Scholar] [CrossRef] [Green Version]
  25. Gabriel, E.; Fagg, G.E.; Bosilca, G.; Angskun, T.; Dongarra, J.J.; Squyres, J.M.; Sahay, V.; Kambadur, P.; Barrett, B.; Lumsdaine, A.; et al. Open MPI: Goals, concept, and design of a next generation MPI implementation. In Proceedings of the 11th European PVM/MPI Users’ Group Meeting, Budapest, Hungary, 19–22 September 2004; Volume 3241, pp. 97–104. [Google Scholar]
  26. Gropp, W.; Lusk, E.; Doss, N.; Skjellum, A. A high-performance, portable implementation of the MPI message passing interface standard. Parallel Comput. 1996, 22, 789–828. [Google Scholar] [CrossRef]
  27. Dagum, L.; Menon, R. OpenMP: An industry standard API for shared-memory programming. IEEE Comput. Sci. Eng. 1998, 5, 46–55. [Google Scholar] [CrossRef] [Green Version]
  28. Rabenseifner, R.; Hager, G.; Jost, G. Hybrid MPI/OpenMP parallel programming on clusters of Multi-Core SMP nodes. In Proceedings of the 17th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, Weimar, Germany, 18–20 February 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 427–436. [Google Scholar]
  29. Miri Rostami, S.R.; Ghaffari-Miab, M. Finite difference generated transient potentials of open-layered media by parallel computing using OpenMP, MPI, OpenACC, and CUDA. IEEE Trans. Antennas Propag. 2019, 67, 6541–6550. [Google Scholar] [CrossRef]
  30. Quinn, M.J. Parallel Programming in C with MPI and OpenMP; McGrawHill: New York, NY, USA, 2000; ISBN 0-07-282256-2. [Google Scholar]
  31. Chen, F.F. Introduction to Plasma Physics and Controlled Fusion; Plenum Press: New York, NY, USA, 1984; pp. 82–94. [Google Scholar]
Figure 1. (a) Schematic showing the simulation model of the EM wave propagation in the ionosphere: B0 is the geomagnetic field; kEM is the propagation direction of the injected EM wave; Ex and Hy are the initial polarized direction of the electric field and magnetic field of the injected EM wave, respectively; the dotted line represents the approximate location of the cavity. (b) Spatial–temporal dispersion scheme of the physical parameters, with the positions of field nodes indicated.
Figure 1. (a) Schematic showing the simulation model of the EM wave propagation in the ionosphere: B0 is the geomagnetic field; kEM is the propagation direction of the injected EM wave; Ex and Hy are the initial polarized direction of the electric field and magnetic field of the injected EM wave, respectively; the dotted line represents the approximate location of the cavity. (b) Spatial–temporal dispersion scheme of the physical parameters, with the positions of field nodes indicated.
Atmosphere 13 00472 g001
Figure 2. Serial program flow chart.
Figure 2. Serial program flow chart.
Atmosphere 13 00472 g002
Figure 3. Parallel program design.
Figure 3. Parallel program design.
Atmosphere 13 00472 g003
Figure 4. Update module calculation strategy.
Figure 4. Update module calculation strategy.
Atmosphere 13 00472 g004
Figure 5. Storage module strategy.
Figure 5. Storage module strategy.
Atmosphere 13 00472 g005
Figure 6. Vertical electric field | E z | in the altitude range of 266.85–267 km at 1.5194 ms with different resolutions.
Figure 6. Vertical electric field | E z | in the altitude range of 266.85–267 km at 1.5194 ms with different resolutions.
Atmosphere 13 00472 g006
Figure 7. Variation in the electric field | E z | with time.
Figure 7. Variation in the electric field | E z | with time.
Atmosphere 13 00472 g007
Figure 8. (a) The height distribution of the Langmuir wave number. (b) Spectrogram analysis for E z at an altitude of 266.9032 m.
Figure 8. (a) The height distribution of the Langmuir wave number. (b) Spectrogram analysis for E z at an altitude of 266.9032 m.
Atmosphere 13 00472 g008
Figure 9. (1) Vertical electric field | E z | (top); (2) ion density perturbation Δ N i in the altitude range of 266.85−267 km at 1.5194 ms (bottom).
Figure 9. (1) Vertical electric field | E z | (top); (2) ion density perturbation Δ N i in the altitude range of 266.85−267 km at 1.5194 ms (bottom).
Atmosphere 13 00472 g009
Figure 10. Speedup from different numbers of MPI tasks and OpenMP threads.
Figure 10. Speedup from different numbers of MPI tasks and OpenMP threads.
Atmosphere 13 00472 g010
Figure 11. (a) Ion density profile with a cavity depth of 15%; (b,d) absolute value of the vertical electric field E z and ion density perturbation Δ N i with a cavity depth of 15% in the altitude range of 266.85–267 km at 1.5194 ms; (c,e) values without cavity.
Figure 11. (a) Ion density profile with a cavity depth of 15%; (b,d) absolute value of the vertical electric field E z and ion density perturbation Δ N i with a cavity depth of 15% in the altitude range of 266.85–267 km at 1.5194 ms; (c,e) values without cavity.
Atmosphere 13 00472 g011
Figure 12. (a) Power spectral density on the height range of 266.85−267 km at 1.5194 ms with different cavity depths; (b) peak of the power spectral density as a function of the cavity depth on the height range of 266.85−267 km at different times.
Figure 12. (a) Power spectral density on the height range of 266.85−267 km at 1.5194 ms with different cavity depths; (b) peak of the power spectral density as a function of the cavity depth on the height range of 266.85−267 km at different times.
Atmosphere 13 00472 g012
Table 1. Serial program performance analysis table.
Table 1. Serial program performance analysis table.
Grid CellsSingle Cyclical Update Time (s) (T1)Storage Time (s)
(T2)
Sampling Period
(n = Δ/Δt)
Time Consumed for a Sampling Period
(T3 = T1 × n + T2) (s)
Ratio of Storage to the Total Time Spent (T2/T3)
35,0000.00891.57661.65696.72%
350,0000.093611.949020.36458.63%
2,000,0001.0476.35500596.412.80%
Table 2. Performance of the FDTD code running with different numbers of OpenMP threads.
Table 2. Performance of the FDTD code running with different numbers of OpenMP threads.
Grid Cells Num Threads (Seconds)
146122432
35,0000.0089210.0097820.0085730.0082730.0061590.005948
350,0000.093640.075740.0734650.0717030.0472050.039629
2,000,0001.040.490.4170.360.2580.174
Table 3. Performance of the storage module code running with different numbers of MPI tasks. T1 represents the single storage time, and T2 is the ideal data forwarding time. The wall-clock consumption for point-to-point non-blocking communication is defined as the ideal data forwarding time, and MPI_Wtime() is used to compute it.
Table 3. Performance of the storage module code running with different numbers of MPI tasks. T1 represents the single storage time, and T2 is the ideal data forwarding time. The wall-clock consumption for point-to-point non-blocking communication is defined as the ideal data forwarding time, and MPI_Wtime() is used to compute it.
Grid CellsT1Number of MPI Task (Seconds)T2
2481632
35,0001.5760.83020.44180.18430.12170.08720.080
350,00011.930.23430.04010.01420.01240.01150.0095
2,000,00076.350.06740.06580.06370.06480.06570.064
Table 4. Speedup comparison of the adaptive allocation of tasks and threads with manual allocations of threads and tasks on different CPUs.
Table 4. Speedup comparison of the adaptive allocation of tasks and threads with manual allocations of threads and tasks on different CPUs.
Code Numi7 9700K (8)XEON GLOD 6254 (18)
Thread NumTask NumSpeed-UpThread NumTask NumSpeed-Up
(1)41214.84171922.45.
(2)21415.35142223.86
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

He, L.; Chen, J.; Lu, J.; Yan, Y.; Yang, J.; Yuan, G.; Hao, S.; Li, Q. A Hybrid MPI/OpenMP Parallelization Scheme Based on Nested FDTD for Parametric Decay Instability. Atmosphere 2022, 13, 472. https://doi.org/10.3390/atmos13030472

AMA Style

He L, Chen J, Lu J, Yan Y, Yang J, Yuan G, Hao S, Li Q. A Hybrid MPI/OpenMP Parallelization Scheme Based on Nested FDTD for Parametric Decay Instability. Atmosphere. 2022; 13(3):472. https://doi.org/10.3390/atmos13030472

Chicago/Turabian Style

He, Linglei, Jing Chen, Jie Lu, Yubo Yan, Jutao Yang, Guang Yuan, Shuji Hao, and Qingliang Li. 2022. "A Hybrid MPI/OpenMP Parallelization Scheme Based on Nested FDTD for Parametric Decay Instability" Atmosphere 13, no. 3: 472. https://doi.org/10.3390/atmos13030472

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop