Machine Learning-Based Soft-Error-Rate Evaluation for Large-Scale Integrated Circuits

Song, Ruiqiang; Shao, Jinjin; Chi, Yaqing; Liang, Bin; Chen, Jianjun; Wu, Zhenyu

doi:10.3390/electronics12244978

Open AccessArticle

Machine Learning-Based Soft-Error-Rate Evaluation for Large-Scale Integrated Circuits

by

Ruiqiang Song

^1,2,*

,

Jinjin Shao

¹

,

Yaqing Chi

^1,2,*

,

Bin Liang

^1,2,

Jianjun Chen

^1,2 and

Zhenyu Wu

^1,2

¹

College of Computer, National University of Defense Technology, Changsha 410073, China

²

Key Laboratory of Advanced Microprocessor Chips and Systems, Changsha 410073, China

^*

Authors to whom correspondence should be addressed.

Electronics 2023, 12(24), 4978; https://doi.org/10.3390/electronics12244978

Submission received: 6 November 2023 / Revised: 9 December 2023 / Accepted: 11 December 2023 / Published: 12 December 2023

(This article belongs to the Special Issue Radiation Effects of Advanced Electronic Devices and Circuits)

Download

Browse Figures

Versions Notes

Abstract

:

Transient pulses generated by high-energy particles can cause soft errors in circuits, resulting in spacecraft malfunctions and posing serious threats to the normal operation of spacecraft. For integrated circuits used in space applications, it is necessary to first evaluate soft errors caused by transient pulses. Conventional soft-error-rate evaluation tools are designed to simulate the generation of transient pulses using many accurate models, while the propagation of transient pulses is primarily simulated by circuit-level simulation tools. Due to the limitations of simulation tools, conventional evaluation approaches are limited to the circuit scale. The simulation runtime is unbearable for large-scale integrated circuits. This paper presents an approach for evaluating the soft error rate using machine learning. A back propagation neural network is implemented in the proposed approach. It helps to determine the probability of transient pulse propagation. Compared with the conventional soft-error-rate evaluation results, the proposed approach demonstrates a strong correlation in both trend and magnitude. The average difference between the results obtained using the proposed evaluation method and the experimental results is 23.5%, which is 7.5% higher than that between the results obtained using the conventional evaluation method and the experimental results. Compared to the conventional evaluation method, the proposed approach improves the runtime by an order of magnitude. The proposed approach also benefits the locating of highly sensitive circuit nodes in large-scale integrated circuits. Circuit design and radiation hardening are both useful applications.

Keywords:

machine learning; single event transient; soft error rate; transient pulse propagation

1. Introduction

When a high-energy particle passes through an integrated circuit in the space radiation environment, it loses energy along its path [1,2]. The lost energy is transferred to the semiconductor material, ionizing electrons of silicon atoms [3]. These ionized electron-hole pairs are subject to both drift and diffusion. They move throughout the entire semiconductor material and are collected by transistors [4,5,6]. The collected electron-hole pairs produce unexpected transient pulses in circuit nodes [7]. These transient pulses propagate along the circuit path and cause soft errors [8,9,10]. A soft error is a significant threat to integrated circuits. It alters the logic function and can potentially lead to catastrophic consequences for an entire chip, system, or even a spacecraft.

To mitigate soft errors in integrated circuits for space applications, it is crucial to evaluate the soft error rate (SER) during the circuit design phase. In previous works, several circuit-level evaluation approaches have been proposed to investigate the SER of integrated circuits [11,12,13,14,15,16,17,18,19,20,21]. These works have proposed many accurate models to generate transient pulses in circuit nodes. Then, they utilize simulation tools, such as the Simulation Program with Integrated Circuit Emphasis (SPICE) and Technology Computer-Aided Device (TCAD), to simulate transient pulse propagation and capture. Based on the simulated results, conventional evaluation approaches determine soft errors and calculate the SER of integrated circuits. Due to the limitations of simulation tools, conventional evaluation approaches are limited to the circuit scale. The simulation runtime is unbearable for large-scale integrated circuits [22].

This paper presents a novel approach for evaluating SER in order to reduce the simulation runtime. A back propagation neural network (BPNN) is implemented to determine the probability of transient pulse propagation. The SER can be determined based on the probability of propagation. The proposed approach does not require determining the probability value of transient pulse propagation to flip-flops through actual circuit-level simulation. Instead, it takes the probability of pulse propagation for each instance in the data path as the input value. This input is then fed into a machine learning model, and the propagation probability value is obtained through the calculation of the machine model. A chip with three test circuits was designed using commercial CMOS technology to investigate the accuracy of the proposed approach. The proposed approach achieves a good consistency in both trend and order of magnitude.

2. SER Evaluation Overview

In previous works, several approaches for evaluating soft error rate (SER) have been proposed. These approaches are used to evaluate key circuits, including combination circuits, flip-flops, and SRAM. In general, the existing soft error evaluation approaches are mainly divided into three categories: SPICE-level evaluation approaches, TCAD-level evaluation approaches, and Monte Carlo-based evaluation approaches.

The SPICE-level evaluation approach is widely used. Based on the SPICE device model and the netlist of the evaluated circuit, a separate current source is introduced directly at the sensitive node of the circuit to simulate the transient current caused by incident particles [23,24,25]. Then, it simulates the corresponding circuit response to obtain soft errors. Correas et al. simulated the evaluation of a 90 nm SRAM circuit using the SPICE circuit-level soft error evaluation tool. The evaluation results obtained are in good agreement with the experimental results [12]. Shambhulingaiah et al. utilized the same tool to simulate sequential instances, such as flip-flops, and identified the sensitive nodes of the flip-flop [13]. Wang and Du et al. utilized the SPICE circuit-level simulation tool to simulate the propagation process of single-event transients in large-scale combinational circuits. Their objective was to assess the impact of single-event transient pulses on soft errors in these circuits [14,15]. Li et al. simulated and analyzed the reliability of integrated circuits using the SPICE tool and proposed a corresponding evaluation process [16].

The TCAD-level evaluation approach differs different from the SPICE-level evaluation approach. It first constructs the TCAD model based on the layout structure and manufacturing process parameters of the circuit instance. Then, the TCAD model simulates the ionization of electron-hole pairs in the incident particles using a specific numerical distribution, such as exponential or Gaussian distribution. The transport process of electron-hole pairs in the TCAD model is calculated using the carrier drift diffusion and other models embedded in the TCAD simulation tool. It simulates the charge collection of the instance and the instantaneous response of the circuit node, and determines whether the circuit instance produces soft errors. Yoni et al. constructed a comprehensive 3D TCAD model using the layout structure of D flip-flops. They subsequently simulated the circuit’s response when the D flip-flop cell was exposed to terrestrial neutrons [17]. Xu et al. utilized TCAD simulation tools to investigate the mechanism of soft errors in standard instances. They also employed a combination of TCAD simulation tools and SPICE circuit-level simulation tools [18].

Recently, the Monte Carlo-based evaluation approach has become an important evaluation approach. It utilizes Monte Carlo tools, such as Geant4 and SRIM, to simulate and calculate the interaction between the incident particles and the semiconductor material. Then, it converts the charge accumulated by the incident particles in the material into charge and transient current collected by the device through charge transport and charge collection mechanisms. Finally, it simulates the transient response of the circuit using additional simulation tools and determines whether a soft error occurs. Many Monte Carlo-based evaluation approaches have been proposed to evaluate the SER of circuits, such as MRED [11], MUSCA SEP3 [22], PHITS-HyENEXSS [19], and IRT [20].

3. The SER Evaluation Using Machine Learning Models

3.1. The Transient Pulse Propagation Probability

A conventional register-to-register circuit path in integrated circuits is shown in Figure 1. It is used to explain the evaluation of transient pulse propagation using machine learning models. When a high-energy particle strikes this circuit, some logic instances (such as C0) collect the ionized electron-hole pairs and produce a transient pulse at circuit nodes. Then, the transient pulse propagates to flip-flops along circuit paths. When the transient pulse arrives at the input pin of instance C1, it propagates directly, and the probability of transient pulse propagation

P_{C 1}

is equal to 1. However, when the transient pulse arrives at the input pin of instances C2 and C3, it may not propagate due to logic masking. For instance, C2 is an OR-gate instance. The transient pulse can only propagate when the value of the other input pin is 0. Similarly, C3 is an AND-gate instance. The transient pulse is able to propagate only when the value of the other input pin is 1. Therefore, the transient pulse propagation probabilities

P_{C 2}

and

P_{C 3}

depend on the instance type and input pin values. The values are determined using the following equations:

P_{C 2} = 1 - P_{o t h e r p i n, 1}

(1)

P_{C 3} = P_{o t h e r p i n, 1}

(2)

where

P_{o t h e r p i n, 1}

is the probability when the value of the other pin is 1. If the input vectors are random,

P_{o t h e r p i n, 1}

is equal to 0.5. The transient pulse that can propagate to flip-flops is determined by the propagation probabilities along the circuit path. For instance, the transient pulse that can propagate to flip-flop 1 (FF1) is determined by

P_{C 1}

,

P_{C 2}

, and

P_{C 3}

. If a relationship between

P_{F F 1}

and

P_{C 1}

–

P_{C 3}

can be determined, the transient pulse propagation can be easily evaluated.

Unfortunately, determining the relationship between flip-flops and logic cells along the circuit path is challenging due to the complexity of circuit structures. A simple fitting equation may not be suitable for all circuit structures. Recently, machine learning has been widely used in integrated circuit design [18,26]. Some machine learning models are used to analyze circuit structures in order to identify inherent connections between circuit cells. In this paper, a machine learning model (BPNN) is used to determine the relationship between flip-flops and logic instances along the circuit path. The BPNN is first described in [27], and its basic structure consists of neurons, interconnection layers, and connection weight values. In this paper, the BPNN model consists of an input interconnection layer, a hidden interconnection layer, and an output interconnection layer. The basic structure is shown in Figure 2. The input layer consists of 20 neurons, which is determined by the maximum number of stages in the data paths. The number of neurons in a hidden layer and the number of hidden layers are adjustable parameters. They can impact the prediction accuracy of the BPNN model. The transient pulse propagation probability calculated by the BPNN model is significantly different from the results of SPICE-level simulations when the number of neurons in one hidden layer is lower (5 to 10). The prediction accuracy is less than 0.6. With an increase in the number of neurons in the hidden layer, the model’s prediction accuracy is significantly improved. When the number of neurons exceeds 15, the calculation accuracy of the BPNN model can approach 0.9. However, the prediction accuracy does not improve any further when the number of neurons exceeds 20. Instead, the training time and prediction time increase significantly. In particular, when the number of neurons in the hidden layer reaches 23, the model’s prediction accuracy is reduced by 2% to 5%. There was overfitting during the training of the BPNN model. Therefore, the hidden layer consists of 15 neurons. This ensures that the calculations are highly accurate and also allows for a more efficient training and prediction time. Furthermore, the model’s prediction accuracy does not significantly improve as the number of hidden layers increases. However, including additional connection weight values required significant adjustments, which led to a substantial increase in training time. Therefore, there is only a single hidden layer used to construct the BPNN model. The output layer consists of 10 neurons, which is determined by the range of propagation probability. The input vectors of the BPNN are the probabilities of transient pulse propagation along the circuit path. The output value of the BPNN is the probability that a transient pulse can propagate to a flip-flop. The basic neuron is activated by the sigmoid function:

f (x_{i}) = \frac{1}{1 + e^{- x}}

(3)

Based on Equation (3), the neurons in the input layer can be calculated using the following equation:

f (P_{C i}) = \frac{1}{1 + e^{- P_{C i}}}

(4)

The input values of neurons in the hidden layer are determined by the output values of neurons in the input layer. The neurons in the hidden layer can be calculated by the following equation:

N_{j, h i d d e n} = \sum_{i = 0}^{M - 1} w_{i j} f (P_{C i})

(5)

where

w_{i j}

represents the connection weight value between the input layer and the hidden layer.

N_{j, h i d d e n}

represents the output value of the hidden layer neuron. Similarly, the neurons in the output layer can be calculated using Equation (5), and the transient pulse propagation probability to flip-flops can be determined:

P_{F F_{k}} = \sum_{j = 0}^{N - 1} w_{j k} N_{j, h i d d e n}

(6)

The most important aspect of BPNN is training the connection weight value

w_{i j}

and

w_{j k}

. These connection weight values significantly affect the accuracy of evaluating transient pulse propagation. In this paper, several benchmark circuits from the ISCAS 85 suite are selected. The ISCAS ’85 benchmark circuits are ten combinational circuits provided to authors at the 1985 International Symposium on Circuits and Systems. They have subsequently been used by many researchers as a basis for comparing results in test generation. The selected circuits are used to create a training set. The training set is used to calibrate connection weight values. A benchmark circuit structure is shown in Figure 1. It is used to illustrate how to generate the training set. Firstly, one circuit instance is randomly selected, such as C0. Based on the C0, circuit instances C1, C2, and C3 are extracted because they are part of the data path. The probabilities of transient pulse propagation for

P_{C 1}

,

P_{C 2}

, and

P_{C 3}

are determined. Secondly, a SPICE-level simulation tool is used to simulate the propagation of transient pulses in the benchmark circuit. The input value of the circuit changes randomly with each clock cycle, allowing the value in each circuit instance to be altered. A dual exponential current source is then injected into the C0. The equation of the dual exponential current source can be shown in our previous work [28]. The SPICE-level simulation tool is used to determine whether the transient pulses can propagate to flip-flop 1. This process is repeated multiple times (such as 1000 times) to count the number of pulses that successfully propagate to flip-flop 1. The propagation probability

P_{F F_{1}, s i m u l a t i o n}

is obtained by dividing the count data by the total number of injected transient pulses. Finally,

P_{C 1}

,

P_{C 2}

,

P_{C 3}

, and

P_{F F_{1}, s i m u l a t i o n}

constitute one datum in the training set. Then, another circuit instance is selected, such as C4, C5 or C6. The above steps are repeated to obtain more data in the training set. During BPNN training,

P_{C 1}

,

P_{C 2}

, and

P_{C 3}

are used as input data. BPNN calculates the transient pulse propagation probability

P_{F F_{1}, p r e d i c t i o n}

through Equations (4)–(6). This data will be different from

P_{F F_{1}, s i m u l a t i o n}

. Based on the prediction results

P_{F F_{1}, p r e d i c t i o n}

and the simulation results

P_{F F_{1}, s i m u l a t i o n}

, connection weight values are calibrated using the following equations:

E (w) = \frac{1}{2} \sum_{k = 0}^{N - 1} {(P_{F F_{k}, p r e d i c t i o n} - P_{F F_{k}, s i m u l a t i o n})}^{2}

(7)

w_{i j, n e w} = w_{i j, o l d} - η_{1} \frac{\partial E (w)}{\partial w_{i j, o l d}}

(8)

w_{j k, n e w} = w_{j k, o l d} - η_{1} \frac{\partial E (w)}{\partial w_{j k, o l d}}

(9)

In this paper, parallel simulation is used to accelerate the generation of training data. The cost to obtain the training data is no more than 9 h for each test circuit. Approximately 12,000 training data were generated through the simulations mentioned above. In total, 60% of the data was used to train the BPNN model, and the remainder was used for model validation. The prediction accuracy of the model is calculated using the Equation (10). P represents the precision value of the prediction.

T P

represents the number of positive predictions that are correct, while

F P

represents the number of positive prediction errors. The calculation results are detailed in Table 1.

P = \frac{T P}{T P + F P}

(10)

3.2. The Transient Pulse Capture Evaluation

Another important aspect of SER evaluation is the capture of transient pulses. When a transient pulse arrives at the input pin of flip-flops, such as FF1 in Figure 1, it needs to meet a certain signal–clock relationship to be captured. If the transient pulse is not captured by flip-flops, it will not alter the stored value of the flip-flops and will not result in a soft error. Figure 3 shows the relationship between the transient pulse and the clock waveform. The capture of transient pulses depends on both the width of the pulse and the period of the clock. The probability of capturing transient pulses in flip-flops can be calculated using the following equation:

P_{c a p t u r e, F F_{k}} = \frac{T_{w i d t h}}{T_{p e r i o d}} P_{F F_{k}, p r e d i c t i o n}

(11)

where

P_{F F_{k}, p r e d i c t i o n}

is calculated by the BPNN.

T_{p e r i o d}

represents the clock period, while

T_{w i d t h}

denotes the transient pulse width. Since the incident time of high-energy particles is random, it is also random whether the transient pulse and the clock period satisfy the signal-clock relationship. Therefor, a random function in the range of 0 to 1 is used to determine whether the transient pulse is captured by flip-flops. For each flip-flop affected by the transient pulse, the random function generates a value. If the random value is lower than the transient pulse capture probability

P_{c a p t u r e, F F_{k}}

, the transient pulse can be captured by flip-flops. On the contrary, the transient pulse is not captured when the random value exceeds the probability of capturing the transient pulse.

4. The SER Evaluation Approach

Based on the above SER evaluation principles, a machine learning-based SER evaluation approach is proposed. The basic flow of the proposed approach is shown in Figure 4.

The gate-level netlist of the circuit serves as an input file for the proposed evaluation approach. Before starting SER evaluation, the number of transient pulses that need to be injected is determined based on both the incident particle flux and the area of the circuit (the layout area or the sum of all instance areas). For instance, if the flux of the incident particles is 1 × 10

^{7}

ions/cm

^{2}

and the circuit’s layout area is 1 mm

^{2}

, it indicates that 1 × 10

^{5}

ions will strike the circuit. Therefore, when performing SER evaluation, the proposed approach also needs to evaluate the soft errors that occur in the circuit after 1 × 10

^{5}

transient pulse injections. In addition, since the location of the incident particle is random, its impact on the circuit instance is also random. Therefore, for each transient pulse injection, the proposed approach first randomly selects a circuit instance based on the circuit netlist. This means that the circuit instance is affected by the incident particles, resulting in the production of a transient pulse. Secondly, for each data path, all connected logic instance types from the selected instance to flip-flops are extracted. Each logic instance type is then converted into a probability of transient pulse propagation. It is important to note that since the selected logical instance may affect multiple flip-flops, it is often possible to generate multiple input data for the machine learning model in the second step. Thirdly, the data are inputted into the calibrated BPNN model, and the propagation probability of the transient pulse to a flip-flop is calculated using Equations (4)–(6). Fourthly, Equation (11) is used to determine whether the transient pulse will be captured by the flip-flop based on the calculated propagation probability. If the result calculated in Equation (11) is less than the random value generated by the random function, the transient pulse is considered to be captured by the flip-flop. The number of soft errors has increased by one. Fifthly, once all affected flip-flops have been traversed, the number of soft errors in the circuit caused by a transient pulse can be determined. After calculating all transient pulse injections, the total number of soft errors can be determined under a specific incident flux condition. The soft error rate of the circuit can be calculated.

A comparison between the proposed approach and other conventional evaluation approaches is shown in Table 2. The proposed approach only uses the BPNN model to obtain SER evaluation results. It does not realistically simulate the transient pulse propagation and capture using circuit-level simulation tools. The proposed approach can reduce the run time of SER evaluation and is not limited by the size of the circuit. It is worth noting that the proposed approach not only obtains the SER of the evaluated circuit. It is also useful for locating highly sensitive circuit nodes in the evaluated circuit. For instance, specific circuit nodes are selected in the proposed approach. Then, the SER flow runs to obtain the SER of specific circuit nodes. Compared to the calculated results, the nodes of the high-sensitive circuit are located.

5. The SER Evaluation Approach Validation

5.1. Test Chip Design and Experimental Setup

A SER test structure was designed using commercial CMOS technology to investigate the accuracy of the proposed approach. The schematic of the SER test structure is shown in Figure 5. It consists of one random vector generator, two test circuits, and one SER detection circuit. The random vector generator (Linear Feedback Shift Register, LFSR) is used to create the input vectors. The test circuit consists of combinational logic instances and flip-flops. Note that two test circuits have the same topology and layout structures. However, they are spaced out widely to ensure that an incident particle only impacts one test circuit, as shown in Figure 6. The input pins of two test circuits are connected to the random vector generator. It ensures the test circuits have the same input vectors. The SER detection circuit consists of several XOR-gate instances and OR-gate instances. The XOR-gate instances compares the output values of two test circuits. If a test circuit is irradiated by incident particles, the output values are changed. However, the other test circuit is not impacted by incident particles; it can produce correct output values. The XOR-gate instance produces a 1 due to the different output values. The soft error induced by the incident particle is propagated to the SER counter circuit and satisfied.

A test chip with three SER test structures was fabricated using commercial 65 nm CMOS technology. The detailed test chip layout is shown in Figure 6. The test chip was irradiated with heavy ions. Four heavy ions with different parameters were chosen, as shown in Table 3. The heavy-ion dose rate was 1 × 10

^{4}

ions/cm

^{2}

/s, and the flux was 1 × 10

^{7}

ions/cm

^{2}

. Before the radiation experiment, the cover plate from the test chip was removed. The front side of the chip was positioned within the range of the ion beam’s influence. During radiation experiments, the ion beams randomly strike their respective targets. Some ions strike the test chips and generate transient pulses. The test system consisted of a test chip and other necessary chips, such as field-programmable gate arrays (FPGAs) and serial communication chips [29,30]. FPGAs connected all signal ports (input, output, and clock) of the test chip to provide input and clock signals. They were also used to capture output signals when the test chip was irradiated. After conducting heavy-ion experiments, the error counts were exported to the computer using the serial communication interface.

5.2. SER Evaluation Setup

The proposed evaluation approach and the SPICE-level evaluation approach were used to investigate the SER of test circuits. The gate-level netlist of three test circuits serves as an input file for the evaluation approaches. The number of transient pulse injections per test circuit is determined by dividing the particle flux by the layout area of the test circuit. The position and moment of transient pulse injection are random. This is due to the fact the location and momentum of the incident particles are random during radiation experiments. The width of transient pulses injected into the test circuit is set to 100 ps, 200 ps, 350 ps, and 500 ps, respectively. It represents the various pulse widths generated by different particles. These data were determined through our previous transient pulse measurements [31,32,33]. For each test circuit, the number of soft errors can be obtained when the evaluation tool completes the transient pulse width injection simulation. According to the number of soft errors, the soft error cross-section can be calculated using the following equation:

S E R = \frac{N_{e r r o r}}{f_{i o n s} \times N_{i n s t a n c e s}}

(12)

where

N_{e r r o r}

represents the number of soft errors that is calculated by the evaluation approaches.

f_{i o n s}

represents the flux of ions.

N_{i n s t a n c e s}

represents the total instance number of test circuits.

5.3. SER Evaluation Results Comparison

The evaluation results of the proposed approach and the conventional SER evaluation approach are compared first. The connection weight values were saved after training the BPNN model. They are imported again into the BPNN model during the evaluation of test circuits. The BPNN model is used to calculate the pulse propagation probabilities in circuits A, B, and C. The calculated results are compared with the transient pulse propagation probabilities simulated by the circuit-level simulation tool. The prediction accuracy is calculated using Equation (10) and the results are shown in Table 4. The average prediction accuracy of the three test circuits is approximately 0.8. Although the three circuits have different circuit structures, the trained BPNN model can still accurately calculate the probability of transient pulse propagation.

Figure 7 shows the comparison between the evaluation results and experimental results. The evaluation results obtained by the proposed approach show good consistency in both trend and order of magnitude. The difference between the results obtained using the proposed approach and the experimental results is calculated. The average value is 23.5%, which is 7.5% higher than that between the conventional evaluation approach and the experimental results. When the LET value is 15.2 MeV·cm

^{2}

/mg, there is a greater discrepancy between the simulation results and the experimental results. With the increase in LET value, the simulation results are in good agreement with the experimental results. Some reasons may cause this difference. The first reason is that the proposed approach does not consider the transient pulse reshaping and reconvergence. Transient pulse reshaping or reconvergence results in a change in the width of the transient pulse, which in turn change the data value of T

_{w i d t h}

in Equation (11). Circuit-level simulations are used to investigate the difference in circuit A at low LET values. When particles strike most instances, it only generates a transient pulse that propagates to the input of the flip-flop. However, when a particle strikes a specific instance with a large fanout, although it only produces one transient pulse, more than one transient pulse is propagated to the input of the flip-flop due to pulse reconvergence, as shown in Figure 8. For the proposed approach, it is still evaluated based on the pulses that propagate independently on different data paths. As a result, the evaluation results may not accurately reflect soft errors and may differ from the experimental results. When the LET value increases, the transient pulse width generated by the ions also increases. It reduces the change in the width and number of transient pulses caused by the pulse reconvergence. The evaluation results obtained using the proposed evaluation method are closer to the experimental results.

The other reason for this difference is the evaluation accuracy of the transient pulse propagation. The transient pulse propagation is evaluated using the BPNN. The training accuracy of connection weight values is the key factor that affect the evaluation accuracy. In our previous works, we observed a significant decrease in the prediction precision value when the probability of transient pulse propagation exceeded 0.7. It indicates that the propagation probability obtained by the BPNN are significantly different from the conventional transient pulse simulation results [34]. Because fewer combinational logic cells with large circuit stages have a high probability of transient pulse propagation, the training set does not include enough data, and the connection weight values are not effectively trained for this situation. Increasing the train set data can improve the training accuracy of connection weight values. It may be an effective way to solve this difference.

Figure 9 shows the average simulation runtime of the proposed approach and the conventional SER evaluation approach. For the conventional evaluation approach, the circuit-level simulation tool is used to simulate the propagation of transient pulses. This simulation helps determine if the flip-flops can capture the transient pulses. Although the size of the test circuit is only 10,000 instances, it results in a significant increase in time cost for a single circuit-level simulation. The significant time cost greatly reduces the evaluation performance of the conventional evaluation approach. For the proposed approach, a machine learning model (Equations (4)–(6)) is utilized to calculate the probability of transient pulse propagation. Subsequently, the transient pulse is captured using the transient pulse probability equation (Equation (11)). The proposed method can determine the soft error of the circuit solely through equation calculations. Therefore, the proposed method can significantly reduce the time required and enhance the performance of soft error evaluation.

In addition, Circuit C has only 5000 more instances than Circuit B. However, the simulation time for the conventional evaluation methods is nearly doubled. When the circuit size increases further, the evaluation time of traditional evaluation methods becomes unacceptable, limiting the size of the circuit that can be supported by this approach. For the proposed approach, as the circuit size increases, the evaluation time only shows a slight improvement. The proposed approach can support larger scale circuits, which improves the performance of the soft error evaluation method.

6. Conclusions

This paper has presented an approach to evaluate the SER of integrated circuits. A machine learning model (BPNN) is implemented in the proposed approach. It helps to evaluate the transient pulse propagation and capture. Some commercial integrated circuits are designed and fabricated to validate the capability of the proposed approach. Compared to experimental data and the conventional SER evaluation results, the proposed approach also demonstrates a strong correlation in terms of trend and magnitude with the improvement in simulation runtime. The proposed evaluation tool has been used to evaluate the SER of circuits with more than 10,000 gates, which demonstrates that the proposed model can be applied to evaluate logic circuits with more than 10,000 gates. It is useful for circuit design and radiation hardening.

Author Contributions

Conceptualization, test chip design, writing—original draft preparation, R.S.; writing—review and editing, J.S.; heavy-ion experiment, Y.C. and J.C.; experimental results analysis and validation, B.L. and Z.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grant No. 62174180) and the National University of Defense Technology research project (Grant No. ZK20-11).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Acknowledgments

We wish to acknowledge the HI-13 team and the HIRFL team for heavy-ion experiment support.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

BPNN	back propagation neural network
SER	soft error rate
SPICE	Simulation Program with Integrated Circuit Emphasis
LFSR	Linear Feedback Shift Register
TCAD	Technology Computer-Aided Device

References

Zhang, Z.; Arehart, A.; Cinkilic, E.; Chen, J.; Zhang, E.; Fleetwood, D.; Schrimpf, R.; McSkimming, B.; Speck, J.S.; Ringel, S.A. Quasimonotonicity, regularity and duality for nonlinear systems of partial differential equations. Appl. Phys. Lett. 2013, 103, 042102. [Google Scholar] [CrossRef]
Fleetwood, D.; Shaneyfelt, M.; Schwank, J. Estimating oxide-trap, interface-trap, and border-trap charge densities in metal-oxide-semiconductor transistors. Appl. Phys. Lett. 1994, 64, 1965. [Google Scholar] [CrossRef]
Jun, B.; White, Y.; Schrimpf, R.; Fleetwood, M.; Brunier, F.; Bresson, N.; Cristoloveanu, S. Characterization of multiple Si/SiO₂ interfaces in silicon-on-insulator materials via second-harmonic generation. Appl. Phys. Lett. 2004, 85, 3095. [Google Scholar] [CrossRef]
Sun, Q.; Guo, Y.; Liang, B.; Tao, M.; Chi, Y.; Huang, P.; Wu, Z.; Luo, D.; Chen, J. Higher NMOS Single Event Transient Susceptibility Compared to PMOS in Sub-20 nm Bulk FinFET. IEEE Electron Device Lett. 2023, 44, 1712–1715. [Google Scholar] [CrossRef]
Pieper, N.J.; Xiong, Y.; Feeley, A.; Pasternak, J.; Dodds, N.; Ball, D.R.; Bhuva, B.L. Study of Multicell Upsets in SRAM at a 5-nm Bulk FinFET Node. IEEE Trans. Nucl. Sci. 2023, 70, 401–409. [Google Scholar] [CrossRef]
He, X.; Yue, D.; Huang, P.; Zhao, Z. Experimental Investigation of Charge Sharing Induced SET Depending on Transistors in Abutted Rows in 65 nm Bulk CMOS Technology. IEEE Access 2022, 10, 57362–57368. [Google Scholar] [CrossRef]
Ferlet-Cavrois, V.; Massengill, L.W.; Gouker, P. Single Event Transients in Digital CMOS—A Review. IEEE Trans. Nucl. Sci. 2013, 60, 1767–1790. [Google Scholar] [CrossRef]
Uemura, T.; Lee, S.; Min, D.; Moon, I.; Lee, S.; Pae, S. SEIFF: Soft Error Immune Flip-Flop for Mitigating Single Event Upset and Single Event Transient in 10 nm FinFET. In Proceedings of the 2019 IEEE International Reliability Physics Symposium (IRPS) 2019, Monterey, CA, USA, 31 March–4 April 2019; pp. 1–6. [Google Scholar] [CrossRef]
Mochizuki, A.; Onizawa, N.; Tamakoshi, A.; Hanyu, T. Multiple-event-transient soft-error gate-level simulator for harsh radiation environments. In Proceedings of the TENCON 2015—2015 IEEE Region 10 Conference 2015, Macao, China, 1–4 November 2015; pp. 1–6. [Google Scholar] [CrossRef]
Azimi, S.; De Sio, C.; Portaluri, A.; Rizzieri, D.; Vacca, E.; Sterpone, L.; Merodio Codinachs, D. Exploring the Impact of Soft Errors on the Reliability of Real-Time Embedded Operating Systems. Electronics 2023, 12, 169. [Google Scholar] [CrossRef]
Black, J.D.; Dame, J.A.; Black, D.A.; Dodd, P.E.; Shaneyfelt, M.R.; Teifel, J.; Salas, J.G.; Steinbach, R.; Davis, M.; Reed, R.A.; et al. Using MRED to Screen Multiple-Node Charge-Collection Mitigated SOI Layouts. IEEE Trans. Nucl. Sci. 2019, 66, 233–239. [Google Scholar] [CrossRef]
Correas, V.; Saigne, F.; Sagnes, B.; Wrobel, F.; Boch, J.; Gasiot, G.; Roche, P. Prediction of Multiple Cell Upset Induced by Heavy Ions in a 90 nm Bulk SRAM. IEEE Trans. Nucl. Sci. 2009, 56, 2050–2055. [Google Scholar] [CrossRef]
Shambhulingaiah, S.; Lieb, C.; Clark, L.T. Circuit Simulation Based Validation of Flip-Flop Robustness to Multiple Node Charge Collection. IEEE Trans. Nucl. Sci. 2015, 62, 1577–1588. [Google Scholar] [CrossRef]
Wang, F.; Xie, Y.; Rajaraman, R.; Vaidyanathan, B. Soft Error Rate Analysis for Combinational Logic Using An Accurate Electrical Masking Model. In Proceedings of the 20th International Conference on VLSI Design Held jointly with 6th International Conference on Embedded Systems (VLSID’07), Bangalore, India, 6–10 January 2007; pp. 165–170. [Google Scholar] [CrossRef]
Chen, S.; Du, Y.; Liu, B.; Qin, J. Calculating the Soft Error Vulnerabilities of Combinational Circuits by Re-Considering the Sensitive Area. IEEE Trans. Nucl. Sci. 2014, 61, 646–653. [Google Scholar] [CrossRef]
Li, X.; Qin, J.; Huang, B.; Zhang, X.; Bernstein, J. A new SPICE reliability simulation method for deep submicrometer CMOS VLSI circuits. IEEE Trans. Device Mater. Reliab. 2006, 6, 247–257. [Google Scholar] [CrossRef]
Xiong, Y.; Chiang, Y.; Pieper, N.J.; Ball, D.R.; Bhuva, B.L. Soft Error Rate Predictions for Terrestrial Neutrons at the 3-nm Bulk FinFET Technology. In Proceedings of the 2023 IEEE International Reliability Physics Symposium (IRPS), Monterey, CA, USA, 26–30 March 2023; pp. 1–6. [Google Scholar] [CrossRef]
Xu, C.; Liu, Y.; Liao, X.; Cheng, J.; Yang, Y. Machine Learning Regression-Based Single-Event Transient Modeling Method for Circuit-Level Simulation. IEEE Trans. Electron Devices 2021, 68, 5758–5764. [Google Scholar] [CrossRef]
Abe, S.I.; Watanabe, Y.; Shibano, N.; Sano, N.; Furuta, H.; Tsutsui, M.; Uemura, T.; Arakawa, T. Multi-Scale Monte Carlo Simulation of Soft Errors Using PHITS-HyENEXSS Code System. IEEE Trans. Nucl. Sci. 2012, 59, 965–970. [Google Scholar] [CrossRef]
Foley, K.; Seifert, N.; Velamala, J.B.; Bennett, W.G.; Gupta, S. IRT: A modeling system for single event upset analysis that captures charge sharing effects. In Proceedings of the 2014 IEEE International Reliability Physics Symposium, Waikoloa, HI, USA, 1–5 June 2014; pp. 5F.1.1–5F.1.9. [Google Scholar] [CrossRef]
Xiong, X.; Du, X.; Zheng, B.; Chen, Z.; Jiang, W.; He, S.; Zhu, Y. Soft Error Sensitivity Analysis Based on 40 nm SRAM-Based FPGA. Electronics 2022, 11, 3844. [Google Scholar] [CrossRef]
Artola, L.; Gaillardin, M.; Hubert, G.; Raine, M.; Paillet, P. Modeling Single Event Transients in Advanced Devices and ICs. IEEE Trans. Nucl. Sci. 2015, 62, 1528–1539. [Google Scholar] [CrossRef]
Aneesh, Y.M.; Sriram, S.R.; Pasupathy, K.R.; Bindu, B. An Analytical Model of Single-Event Transients in Double-Gate MOSFET for Circuit Simulation. IEEE Trans. Electron Devices 2019, 66, 3710–3717. [Google Scholar] [CrossRef]
Yi, B.; Lee, B.J.; Oh, J.H.; Kim, J.S.; Kim, J.H.; Yang, J.W. Physics-Based Compact Model of Parasitic Bipolar Transistor for Single-Event Transients in FinFETs. IEEE Trans. Nucl. Sci. 2018, 65, 866–870. [Google Scholar] [CrossRef]
Sheng, L.; He, W.; Zhang, Z.; He, L.; Cao, J.; Wu, Q. Investigation of double peak voltage in pulse quenching effect on the single-event transient. In Proceedings of the 2016 IEEE International Nanoelectronics Conference (INEC), Chengdu, China, 9–11 May 2016; pp. 1–2. [Google Scholar] [CrossRef]
Vibhu, V.; Mittal, S.; Kumar, V. Machine Learning-based model for Single Event Upset Current Prediction in 14 nm FinFETs. In Proceedings of the 2023 36th International Conference on VLSI Design and 2023 22nd International Conference on Embedded Systems (VLSID), Hyderabad, India, 8–12 January 2023; pp. 1–6. [Google Scholar] [CrossRef]
Rumelhart, D.; Hinton, G.; Williams, R. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
Song, R.; Chen, S.; Du, Y.; Huang, P.; Chen, J.; Chen, Y. PABAM: A Physics-Based Analytical Model to Estimate Bipolar Amplification Effect Induced Collected Charge at Circuit Level. IEEE Trans. Device Mater. Reliab. 2015, 15, 595–603. [Google Scholar] [CrossRef]
Shao, J.; Song, R.; Chi, Y.; Liang, B.; Wu, Z. TAISAM: A Transistor Array-Based Test Method for Characterizing Heavy Ion-Induced Sensitive Areas in Semiconductor Materials. Electronics 2022, 11, 2043. [Google Scholar] [CrossRef]
Chi, Y.; Cai, C.; He, Z.; Wu, Z.; Fang, Y.; Chen, J.; Liang, B. SEU Tolerance Efficiency of Multiple Layout-Hardened 28 nm DICE D Flip-Flops. Electronics 2022, 11, 972. [Google Scholar] [CrossRef]
Yibai, H.; Shuming, C. Simulation Study of the Selectively Implanted Deep-N-Well for PMOS SET Mitigation. IEEE Trans. Device Mater. Reliab. 2014, 14, 99–103. [Google Scholar] [CrossRef]
He, Y.; Chen, S.; Chen, J.; Chen, Y.; Liang, B.; Liu, B.; Qin, J.; Du, Y.; Huang, P. Impact of Circuit Placement on Single Event Transients in 65 nm Bulk CMOS Technology. IEEE Trans. Nucl. Sci. 2012, 59, 2772–2777. [Google Scholar] [CrossRef]
Chen, J.; Chen, S.; He, Y.; Qin, J.; Liang, B.; Liu, B.; Huang, P. Novel Layout Technique for Single-Event Transient Mitigation Using Dummy Transistor. IEEE Trans. Device Mater. Reliab. 2013, 13, 177–184. [Google Scholar] [CrossRef]
Song, R.; Shi, J.; Shao, J.; Zhang, X. Machine Learning based SET Propagation Prediction for Large Scale Integrated Circuits. In Proceedings of the 2021 IEEE 14th International Conference on ASIC (ASICON), Kunming, China, 26–29 October 2021; pp. 1–4. [Google Scholar] [CrossRef]

Figure 1. A conventional registers-to-registers circuit path.

Figure 2. The BPNN structure used in the paper.

Figure 3. The signal–clock relationship to capture a transient pulse.

Figure 4. The basic flow of the proposed approach.

Figure 5. The schematic of the SER test structure.

Figure 6. The basic SER test structure layout and the detailed test chip layout.

Figure 7. The simulated SER results with different evaluation approaches.

Figure 8. The width and number variation of transient pulses due to pulse reconvergence.

Figure 9. The average simulation runtime of two SER evaluation approaches.

Table 1. The results of the training.

Transient Pulse Propagation Probability Range	Prediction Precision Value
0.1	0.914
0.2	0.922
0.3	0.904
0.4	0.893
0.5	0.877
0.6	0.911
0.7	0.864
0.8	0.844
0.9	0.857
1.0	0.821

Table 2. Comparison of the proposed approach with other state-of-the-art approaches.

SER evaluation approach	The transient pulse generation evaluation	The transient pulse propagation evaluation
SPICE-level simulation approach	The dual exponential current source, etc.	Circuit-level simulation tools
TCAD-level simulation approach	Ionization charge distribution model Carrier transport equation, etc.	Circuit-level simulation tools
Monte Carlo-based simulation approach	Nested sensitive volumes model Drift diffusion equation, etc.	Circuit-level simulation tools
The proposed approach	Pulse width data that vary with LET	Machine learning model

Table 3. Heavy ions used in the experiment.

Ion	Energy at the Silicon Surface (MeV)	Effective LET (MeV·cm²/mg)	Range (um)
Cl	165	15.2	51.8
Ti	185	21.2	37.9
Ge	205	37.6	35.5
Kr	835.5	99.8	41.2

Table 4. The prediction accuracy of three test circuits.

Transient Pulse Propagation Probability Range	Circuit A Prediction Precision Value	Circuit B Prediction Precision Value	Circuit C Prediction Precision Value
0.1	0.832	0.807	0.821
0.2	0.829	0.811	0.825
0.3	0.813	0.824	0.817
0.4	0.824	0.803	0.813
0.5	0.809	0.795	0.808
0.6	0.805	0.792	0.811
0.7	0.789	0.801	0.804
0.8	0.775	0.789	0.796
0.9	0.764	0.773	0.792
1	0.765	0.768	0.788

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Song, R.; Shao, J.; Chi, Y.; Liang, B.; Chen, J.; Wu, Z. Machine Learning-Based Soft-Error-Rate Evaluation for Large-Scale Integrated Circuits. Electronics 2023, 12, 4978. https://doi.org/10.3390/electronics12244978

AMA Style

Song R, Shao J, Chi Y, Liang B, Chen J, Wu Z. Machine Learning-Based Soft-Error-Rate Evaluation for Large-Scale Integrated Circuits. Electronics. 2023; 12(24):4978. https://doi.org/10.3390/electronics12244978

Chicago/Turabian Style

Song, Ruiqiang, Jinjin Shao, Yaqing Chi, Bin Liang, Jianjun Chen, and Zhenyu Wu. 2023. "Machine Learning-Based Soft-Error-Rate Evaluation for Large-Scale Integrated Circuits" Electronics 12, no. 24: 4978. https://doi.org/10.3390/electronics12244978

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning-Based Soft-Error-Rate Evaluation for Large-Scale Integrated Circuits

Abstract

1. Introduction

2. SER Evaluation Overview

3. The SER Evaluation Using Machine Learning Models

3.1. The Transient Pulse Propagation Probability

3.2. The Transient Pulse Capture Evaluation

4. The SER Evaluation Approach

5. The SER Evaluation Approach Validation

5.1. Test Chip Design and Experimental Setup

5.2. SER Evaluation Setup

5.3. SER Evaluation Results Comparison

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI