As different metasurfaces can achieve the same EM response, when the framework inputs the target response, which metasurface will be returned is the non-unique problem. In practical application, though different metasurfaces can generally produce a similar response, it is almost impossible to produce an identical response at each of the sampling points. Therefore, in the global design space, there must be a metasurface that could realize the response closest to the target response, which means a framework is expected to find the optimal solution. The overall framework flow chart is shown in
Figure 1. The task of the RL module is to translate the target EM response into RDV through the representation learning network. Any target EM response corresponds to a unique RDV, while one RDV might correspond to more than one metasurface. To solve this many-to-one problem, the VAE-PSO module generates latent design space, where the PSO algorithm is applied to search globally for the optimal latent vector. The VAE decoder is used to decode the latent vector into the optimal metasurface.
2.1. Metasurface Design
Figure 2 illustrates the detailed structure of a meta-atom. The top pattern layer (Layer 1), with a side length
L1, is evenly divided into 16 × 16 discrete cells, where ‘1’ means copper and ‘0’ means vacuum in digital coding. Thus, the digital metasurface can be modeled as a 16 × 16 binary design matrix, which has 2
8×8 possible patterns. Each discrete copper piece is a square patch with a thickness of
h1 and a side length of
L’
1. The second layer is a dielectric layer (Layer 2) with a dielectric constant of
= 2.65, and a loss tangent of
= 0.001. The back layer (Layer 3) is the backing copper sheet. The thickness and the side length of Layer 2 and Layer 3 are
h2,
h3,
L2, and
L3, respectively. To reduce the influence of polarization, the object of our study is an isotropic metasurface composed of an 8 × 8 coding sequence as a subblock, which forms a 16 × 16 matrix through four-fold symmetry [
9].
In the simulation, boundary conditions and excitations are added to the metasurface model. The calculation principle of the software is solved by Maxwell’s equations based on meshing, whose calculation time surges with the increase of model complexity. The design matrices of meta-atoms are selected as the input features and the S-parameter as the EM responses, in which data sets are randomly collected to train deep learning models. MATLAB to control CST STUDIO is used to generate digital metasurfaces, calculate S parameters, and save data sets automatically.
2.2. Representation Learning Module
The key to our framework is transforming design space and response space into low dimensional space by the RL module to solve the non-uniqueness problem and reduce network complexity [
10]. First, the different spaces and their corresponding vectors need to be clearly defined: The original space includes the original design space (ODS) and original response space (ORS), and the corresponding vectors are the original design vector (ODV) and original response vector (ORV). The reduced space includes reduced design space (RDS) and reduced response space (RRS), and the corresponding vectors are reduced design vector (RDV) and reduced response vector (RRV). As multiple sets of meta-atoms can result in the same EM response, it is desired to map to the same vectors in the reduced space so that RDS and RRS can form a one-to-one mapping controlled by a nonlinear function, and this process is invertible. In this way, we transfer the many-to-one relationship between ODS and ORS into the many-to-one relationship between ODS and RDS, which could be solved by the VAE-PSO module.
Figure 3 illustrates the mapping relationship between different spaces, with the red line representing one-to-one mapping and the blue line representing many-to-one mapping.
Representation learning is a machine learning technology that extracts effective features from raw data. Its essence is a dimensionality reduction technology, which can be realized by random forest, principal component analysis, autoencoder (AE), etc. In this article, the RL module uses AE, which is composed of an encoder and a decoder that can be used separately. The back propagation algorithm is used to continually train AE in order to minimize the cost function during the training process. Equations (1)–(3) illustrate features
h, reconstruct data
and cost function
L.
where
x represents the input data,
W,
b represents the transformation matrix and bias of the encoder and decoder, and
N represents the amount of training data.
σ is the active function. The dimension of
h should be less than
x in order to achieve feature dimension reduction.
Figure 4 illustrates the design process of AEs. The dimension of the ORS is reduced by training the autoencoder shown in
Figure 4a. S11 between 0–20 GHz is selected as the EM response of this study, with a sampling frequency of 0.02 GHz and a length of 1000 dimensions. Subsequently, a pseudo-autoencoder (the input space is different from the output space) in
Figure 4b is used to train the mapping between ODS and RDS, RDS and RRS together. The mapping of the former is many-to-one, and that of the latter is one-to-one. The one-to-one relationship between RDS and RRS is trained by a multi-layer perceptron (MLP). The decoder uses the “decoder1” trained in
Figure 4a.
When the two reduced spaces are completed by a pseudo-autoencoder, an effective forward prediction model can be formed. It maps different design matrices that can achieve the same EM response to the same RDV, then maps them to the RRS one-to-one through MLP, and finally decodes ORV of 1000 dimensions. Subsequently, the inverse design model is built, and the inverse of the MLP should be found in
Figure 4b. As demonstrated in
Figure 4c, an inverse design network composed of the encoder1 in
Figure 4a and the inverse structure of the MLP in
Figure 4b is established. It can generate unique RDV by inputting a 1000-dimensional ORV.
The internal structures of encoders and decoders are various. A convolutional neural network (CNN) and a fully connected layer (FCL) are utilized in this work. Network 1 in
Figure 4a is an autoencoder that reduces ORV of 1000 dimensions. If all FCL is adopted, the network scale and network parameters will be greatly increased, which is difficult to train. Thus, a hybrid network of one-dimensional convolution layer (Conv1D), pooling layer, and FCL are adopted in encoder1, and FCL is used in decoder1 in
Figure 5a. Network 2 in
Figure 4b is a pseudo-autoencoder, which transforms the design matrix into a reduced design vector and established a one-to-one mapping with the reduced response vector using MLP. The aim of network 2 is to train an encoder that can effectively compress an original design matrix to the RDV and train a MLP that connects two reduced spaces. Well-trained network 2 is a forward prediction network. CNN shows good feature extraction effect when processing 2D raster data [
11], so our encoder2 uses a two-dimensional convolution layer (Conv2D) as the main network. The specific network structure is shown in
Figure 5b. In this article, we collected 30,480 samples, among them, 21,366 were used for the training set, 6096 for the validation set, and 3018 for the test set. The training of these networks needs to strictly follow the order shown in
Figure 5.
2.3. Variational Autoencoder-Particle Swarm Optimization Module
Although the final inverse design network can generate the required RDV, we still cannot find all the design matrices from the RDV due to the one-to-many relationship. Theoretically, a RDV may correspond to a multiple design matrix due to the non-uniqueness problem, and there is no suitable deep learning model that can carry out one-to-many mapping. In recent years, optimization algorithms have been widely used in the design of metasurfaces, but their running speed is slow due to the huge computational cost of original space [
12]. Since the RL module has mapped the inverse design problem to a reduced space, a Particle Swarm Optimization (PSO) is applied in this low dimensional space with the purpose to improve the speed of the algorithm.
First, the global vector space of the 16 × 16 meta-atom space is generated to identify the target vector effectively and expeditiously in a global manner. A generative model can well complete this task [
13], for which VAE is chosen to complete the generation of global vector space. A VAE can convert high-dimensional discrete data to a low-dimensional continuous space known as the latent space, where any sampling point can be encoded as a meaningful output. The VAE training process requires continuous learning of conditional probability distributions of inputs given latent variables based on the input data. When VAE is trained with the available binary matrix of 30,000 different isotropic metasurfaces, a continuous low-dimensional design space can be obtained and the reduced dimension is K [
14].
Figure 6a illustrates an architecture of a VAE. Since the design matrix of the metasurface is a two-dimensional grid shape, the V-encoder and V-decoder of VAE are implemented by a convolutional neural network, which is illustrated in
Figure 6b,c respectively.
The encoder first converts the input binary matrix into mean vectors μ and standard deviation vectors σ. The latent vector
Z is sampled from the Gaussian distribution
during the training process. After that, the decoder
G reconstructs
Z into the design matrix. Pattern topologies of meta-atom with comparable features are mapped to the same region of latent space in this process, and similar decoding meta-atoms are constantly modified by disrupting latent vectors. A well-trained decoder can recover any hidden vector sampled in the
Z space into a binary matrix like the training set, which accomplishes the purpose of generating new samples. Equations (4)–(6) illustrate the loss function of a VAE, denoted by
LVAE.
where
x,
represent input data and reconstruct data.
KL refers to Kullback-Leibler divergence. After training VAE, a new network
can be trained to find many-to-one mappings between continuous design space
V and RDV
D in
Figure 7a. As shown in
Figure 7b, the many-to-one relation can be converted into a multi-solution problem in functions. This can be expressed in Equation (7):
Therefore, if the objective RDV and the function expression of the inverse network in
Figure 5a are known, the non-uniqueness problem can be solved. Practically, it is impossible to express the function fitted with the neural network. Simultaneously, it is difficult for different meta-atoms to achieve the same response curve in every sample point. Therefore, it is necessary to introduce an optimization algorithm to find the optimal latent vectors with the smallest distance from the target RDV
Dtar. Searching for the optimal solution to achieve the target response in the huge ODS will save a lot of verification work and greatly improve the inverse design efficiency of the metasurface. In this work, PSO was chosen as the optimization algorithm, which is implemented in Python using the PySwarms toolkit [
15].
Figure 8 illustrates the optimization process. The continuous latent vector space is mapped to the RDS through the network
. The fitness function is obtained by calculating the mapped vector
and the target vector
in RDS, which can be written in Equation (8).
The fitness function is the objective function of optimization. The condition at the end of the optimization iteration is
, where
is an arbitrarily small value based on training data. After optimization, the optimal latent design vector is returned and the binary matrix can be generated by V-decoder. To evaluate the performance of our framework quantitatively [
16], the accuracy of each target EM response is defined in Equation (9).
where
f1 and
f2 are the frequency bounds of the input spectra,
Rtar is the target EM response,
Rgen is the generative response calculated by the generated design matrix from the VAE-PSO framework, which can be obtained directly using the trained network 2 in
Figure 4b. The accuracy measures the matching degree between
Rtar and
Rgen. Applying Equation (9) to all test sets and then averaging them, the average accuracy of the entire framework is calculated. By transforming the non-uniqueness problem into a global optimal solution problem in lower-dimensional space, a high average accuracy of 94% on test sets can be achieved in our framework.