# Principal Component Analysis (PCA) for Powder Diffraction Data: Towards Unblinded Applications

## Abstract

## 1. Introduction

## 2. Theory

## 3. Simulated Data and PCA Analysis

#### 3.1. Li${}_{x}$CoO${}_{2}$

#### 3.1.1. Variation of Occupancy

#### 3.1.2. Variation of Lattice Dimension

- The first score ${S}_{1}$ is given by time dependence of the unit cell dimension;
- The second score ${S}_{2}$ is the square of the first score;
- The first loading ${L}_{1}$ is a diffractogram where every line is a first derivative of the profile function, for the considered symmetric profile of the initial data it implies an anti-symmetric line shape given by the first derivative;
- The second loading ${L}_{2}$ is a sum of lines whose shape is given by the second derivative of the initial profile, for the considered symmetric profile of the initial data it implies a symmetric line shape given by the second derivative.

#### 3.1.3. Variation of Lattice Dimension and Occupancy

#### 3.2. Kr Uptake by $\gamma $–Mg(BH${}_{4}$)${}_{2}$

#### Variation of Occupancy

- ${L}_{1}$ may consist both positive and negative peaks, while ${L}_{2}$ might be positive.
- ${S}_{2}$ might be proportional to the square of ${S}_{1}$.

## 4. PCA Analysis of Real Data for Mg(Bh${}_{\mathbf{4}}$)${}_{\mathbf{2}}$ + Kr${}_{\mathbf{x}}$

## 5. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## Appendix A

**Figure A1.**Profile functions for diffraction components from Table 1: $G(d-\overline{{d}_{\mathbf{H}}})$ (

**a**), ${G}^{\prime}(d-\overline{{d}_{\mathbf{H}}})$ (

**b**), and ${G}^{\u2033}(d-\overline{{d}_{\mathbf{H}}})$ (

**c**).

## Appendix B

- ${\tilde{S}}_{2}$ as a function of ${\tilde{S}}_{1}$ fits with second order polynomial with best R-factor
- ${\tilde{L}}_{1}$ gives a pattern of positive or negative symmetric peaks at least at low-angle part of the pattern.
- ${\tilde{L}}_{1}$ and ${\tilde{S}}_{1}$ for different data subsets are as close to each other as possible.

Subset | $\mathit{\alpha}$ |
---|---|

A | 0.02 |

B | 0.095 |

C | 0.14 |

**Figure 1.**Loadings and scores for two main components for the model case of variation of the occupancy in Li${}_{x}$CoO${}_{2}$: (

**a**) loadings ${L}_{1}$ and ${L}_{2}$, (

**b**) first score ${S}_{1}$ and (

**c**) correlation plot for the scores, ${S}_{2}\left({S}_{1}\right)$ together with the polinomial fit.

**Figure 2.**Loadings and scores for two main components for the model case of variation of lattice dimension in Li${}_{x}$CoO${}_{2}$: (

**a**) loadings ${L}_{1}$ and ${L}_{2}$, (

**b**) first score ${S}_{1}$ and (

**c**) correlation plot for the scores, ${S}_{2}\left({S}_{1}\right)$ together with the polinomial fit.

**Figure 3.**Loadings (

**a**) and scores (

**b**,

**c**) for two main components for the model case of Kr uptake by the porous $\gamma $–Mg(BH${}_{4}$)${}_{2}$. (

**b**) shows the first score with the fit for the expected kinetics, note the difference between the refined rate ($k=0.040\left(3\right)$) and the model value ($k=0.05$). (

**c**) shows the correlation between the scores ${S}_{1}$ and ${S}_{2}$ where the line stays for the best fit of a second-order polynomial function.

**Figure 4.**Loadings (

**a**) and scores (

**b**,

**c**) for two main components for the model case of Kr uptake by the porous $\gamma $–Mg(BH${}_{4}$)${}_{2}$ after rotation corrections. The first score ${S}_{1}$ is shown togther with a fit (

**b**), note the perfect agreement between the fitted and the model rates ($k=0.05$). The correlation between scores ${S}_{1}$ and ${S}_{2}$ together with the best fit with a second-order polynomial function is shown in (

**c**).

**Figure 5.**Powder diffraction data collected as a function of time during Kr uptake by $\gamma $–Mg(BH${}_{4}$)${}_{2}$ at 170 K. Note Kr fluorescence background that gives additional measure of Kr in the irradiated volume.

**Figure 6.**First loadings ${L}_{1}$ (

**a**) and scores ${S}_{1}$ (

**b**) after PCA for the experimental data for Kr uptake by the porous magnesium borohydrate $\gamma $–Mg(BH${}_{4}$)${}_{2}$ for subsets A (first 1000 powder patterns), B (1000 patterns with every second pattern), C (1000 patterns with every fourth pattern).

**Figure 7.**PCA of the experimental data, subset A, for Kr uptake by $\gamma $–Mg(BH${}_{4}$)${}_{2}$ after the rotational correction (see text). (

**a**) Loadings, corrected for the rotation, bottom shows and overlay of the second loading (black circles) together with the diffraction pattern from the Kr sublattice alone (red line). (

**b**) shows the first score with together with Kr occupancy. (

**c**) shows the correlation between the scores with line for a best fit with a second order polynomial function.

**Figure 8.**First loadings ${L}_{1}$ (

**a**) and scores ${S}_{1}$ (

**b**) corrected for the rotation for the experimental data for Kr uptake by the porous magnesium borohydrate $\gamma $–Mg(BH${}_{4}$)${}_{2}$ for the subsets A (first 1000 powder patterns), B (1000 patterns with every second pattern), C (1000 patterns with every fourth pattern).

**Table 1.**Time evolution functions and intensity distributions for components in Equation (7).

n | ${\mathit{S}}_{\mathit{n}}\left(\mathit{t}\right)$ | ${\mathit{I}}_{\mathit{n}}\left(\mathit{d}\right)$ |
---|---|---|

1 | ${S}_{d}\left(t\right)$ | ${\sum}_{\mathbf{H}}{G}^{\prime}\left(d-\overline{{d}_{\mathbf{H}}}\right){\overline{F\left(\mathbf{H}\right)}}^{2}\chi \left(\mathbf{H}\right)$ |

2 | ${S}_{f}\left(t\right)$ | $2{\sum}_{\mathbf{H}}G\left(d-\overline{{d}_{\mathbf{H}}}\right)\overline{F\left(\mathbf{H}\right)}\eta \left(\mathbf{H}\right)$ |

3 | ${S}_{f}\left(t\right){S}_{d}\left(t\right)$ | $2{\sum}_{\mathbf{H}}{G}^{\prime}\left(d-\overline{{d}_{\mathbf{H}}}\right)\overline{F\left(\mathbf{H}\right)}\eta \left(\mathbf{H}\right)\chi \left(\mathbf{H}\right)$ |

4 | ${S}_{d}^{2}\left(t\right)$ | ${\sum}_{\mathbf{H}}{G}^{\u2033}\left(d-\overline{{d}_{\mathbf{H}}}\right){\overline{F\left(\mathbf{H}\right)}}^{2}{\chi}^{2}\left(\mathbf{H}\right)$ |

5 | ${S}_{f}^{2}\left(t\right)$ | ${\sum}_{\mathbf{H}}G\left(d-\overline{{d}_{\mathbf{H}}}\right){\eta}^{2}\left(\mathbf{H}\right)$ |

6 | ${S}_{f}^{2}\left(t\right){S}_{d}\left(t\right)$ | ${\sum}_{\mathbf{H}}{G}^{\prime}\left(d-\overline{{d}_{\mathbf{H}}}\right){\eta}^{2}\left(\mathbf{H}\right)\chi \left(\mathbf{H}\right)$ |

7 | ${S}_{f}\left(t\right){S}_{d}^{2}\left(t\right)$ | 2 ${\sum}_{\mathbf{H}}{G}^{\u2033}\left(d-\overline{{d}_{\mathbf{H}}}\right)\overline{F\left(\mathbf{H}\right)}\eta \left(\mathbf{H}\right){\chi}^{2}\left(\mathbf{H}\right)$ |

8 | ${S}_{f}^{2}\left(t\right){S}_{d}^{2}\left(t\right)$ | ${\sum}_{\mathbf{H}}{G}^{\u2033}\left(d-\overline{{d}_{\mathbf{H}}}\right){\eta}^{2}\left(\mathbf{H}\right){\chi}^{2}\left(\mathbf{H}\right)$ |

