# Principal Component Analysis (PCA) for Powder Diffraction Data: Towards Unblinded Applications

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Theory

## 3. Simulated Data and PCA Analysis

#### 3.1. Li${}_{x}$CoO${}_{2}$

#### 3.1.1. Variation of Occupancy

#### 3.1.2. Variation of Lattice Dimension

- The first score ${S}_{1}$ is given by time dependence of the unit cell dimension;
- The second score ${S}_{2}$ is the square of the first score;
- The first loading ${L}_{1}$ is a diffractogram where every line is a first derivative of the profile function, for the considered symmetric profile of the initial data it implies an anti-symmetric line shape given by the first derivative;
- The second loading ${L}_{2}$ is a sum of lines whose shape is given by the second derivative of the initial profile, for the considered symmetric profile of the initial data it implies a symmetric line shape given by the second derivative.

#### 3.1.3. Variation of Lattice Dimension and Occupancy

#### 3.2. Kr Uptake by $\gamma $–Mg(BH${}_{4}$)${}_{2}$

#### Variation of Occupancy

- ${L}_{1}$ may consist both positive and negative peaks, while ${L}_{2}$ might be positive.
- ${S}_{2}$ might be proportional to the square of ${S}_{1}$.

## 4. PCA Analysis of Real Data for Mg(Bh${}_{\mathbf{4}}$)${}_{\mathbf{2}}$ + Kr${}_{\mathbf{x}}$

## 5. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## Appendix A

**Figure A1.**Profile functions for diffraction components from Table 1: $G(d-\overline{{d}_{\mathbf{H}}})$ (

**a**), ${G}^{\prime}(d-\overline{{d}_{\mathbf{H}}})$ (

**b**), and ${G}^{\u2033}(d-\overline{{d}_{\mathbf{H}}})$ (

**c**).

## Appendix B

- ${\tilde{S}}_{2}$ as a function of ${\tilde{S}}_{1}$ fits with second order polynomial with best R-factor
- ${\tilde{L}}_{1}$ gives a pattern of positive or negative symmetric peaks at least at low-angle part of the pattern.
- ${\tilde{L}}_{1}$ and ${\tilde{S}}_{1}$ for different data subsets are as close to each other as possible.

Subset | $\mathit{\alpha}$ |
---|---|

A | 0.02 |

B | 0.095 |

C | 0.14 |

## References

- Guccione, P.; Palin, L.; Belviso, B.D.; Milanesio, M.; Caliandro, R. Principal component analysis for automatic extraction of solid-state kinetics from combined in situ experiments. Phys. Chem. Chem. Phys. PCCP
**2018**, 20, 19560–19571. [Google Scholar] [CrossRef] [PubMed] - Massart, D.; Vandeginste, B.; Buydens, L.; De Jong, S.; Lewi, P.; Smeyers-Verbeke, J. Chapter 17 Principal components. In Handbook of Chemometrics and Qualimetrics: Part A; Data Handling in Science and Technology; Elsevier: Amsterdam, The Netherlands, 1998; Volume 20, pp. 519–556. [Google Scholar] [CrossRef]
- Abdi, H.; Williams, L.J. Principal component analysis. WIREs Comput. Stat.
**2010**, 2, 433–459. [Google Scholar] [CrossRef] - Mitsui, T.; Okuyama, S.; Fujimura, Y. Determination of the Blend Composition Ratio of Cocaine to Sodium Hydrogencarbonate by X-Ray Diffraction Using Multivariate Analysis. Anal. Sci.
**1991**, 7, 941–945. [Google Scholar] [CrossRef] [Green Version] - Hida, M.; Sato, H.; Sugawara, H.; Mitsui, T. Classification of counterfeit coins using multivariate analysis with X-ray diffraction and X-ray fluorescence methods. Forensic Sci. Int.
**2001**, 115, 129–134. [Google Scholar] [CrossRef] - Jette, O.; Kurt, N.; Kenny, S. Using X-ray powder diffraction and principal component analysis to determine structural properties for bulk samples of multiwall carbon nanotubes. Z. Kristallogr.
**2007**, 222, 186. [Google Scholar] [CrossRef] - O’Flynn, D.; Reid, C.B.; Christodoulou, C.; Wilson, M.D.; Veale, M.C.; Seller, P.; Hills, D.; Desai, H.; Wong, B.; Speller, R. Explosive detection using pixellated X-ray diffraction (PixD). J. Instrum.
**2013**, 8, P03007. [Google Scholar] [CrossRef] - Rodriguez, M.A.; Keenan, M.R.; Nagasubramanian, G. in situ X-ray diffraction analysis of (CF
_{x})_{n}batteries: Signal extraction by multivariate analysis. J. Appl. Crystallogr.**2007**, 40, 1097–1104. [Google Scholar] [CrossRef] - Norrman, M.; Ståhl, K.; Schluckebier, G.; Al-Karadaghi, S. Characterization of insulin microcrystals using powder diffraction and multivariate data analysis. J. Appl. Crystallogr.
**2006**, 39, 391–400. [Google Scholar] [CrossRef] [Green Version] - Caliandro, R.; Altamura, D.; Belviso, B.D.; Rizzo, A.; Masi, S.; Giannini, C. Investigating temperature-induced structural changes of lead halide perovskites by in situ X-ray powder diffraction. J. Appl. Crystallogr.
**2019**, 52, 1104–1118. [Google Scholar] [CrossRef] - Chernyshov, D.; van Beek, W.; Emerich, H.; Milanesio, M.; Urakawa, A.; Viterbo, D.; Palin, L.; Caliandro, R. Kinematic diffraction on a structure with periodically varying scattering function. Acta Crystallogr. Sect. A
**2011**, 67, 327–335. [Google Scholar] [CrossRef] [PubMed] - Caliandro, R.; Chernyshov, D.; Emerich, H.; Milanesio, M.; Palin, L.; Urakawa, A.; Van Beek, W.; Viterbo, D. Patterson selectivity by modulation-enhanced diffraction. J. Appl. Crystallogr.
**2012**, 45, 458–470. [Google Scholar] [CrossRef] - Van Beek, W.; Emerich, H.; Urakawa, A.; Palin, L.; Milanesio, M.; Caliandro, R.; Viterbo, D.; Chernyshov, D. Untangling diffraction intensity: Modulation enhanced diffraction on ZrO
_{2}powder. J. Appl. Crystallogr.**2012**, 45, 738–747. [Google Scholar] [CrossRef] - Chernyshov, D.; Dyadkin, V.; Van Beek, W.; Urakawa, A. Frequency analysis for modulation-enhanced powder diffraction. Acta Crystallogr. Sect. A
**2016**, 72, 500–506. [Google Scholar] [CrossRef] [PubMed] - Palin, L.; Caliandro, R.; Viterbo, D.; Milanesio, M. Chemical selectivity in structure determination by the time dependent analysis of in situ XRPD data: A clear view of Xe thermal behavior inside a MFI zeolite. Phys. Chem. Chem. Phys.
**2015**, 17, 17480–17493. [Google Scholar] [CrossRef] [PubMed] - Harman, H. Modern Factor Analysis, 3rd ed.; The University of Chicago Press: Chicago, IL, USA, 1976. [Google Scholar]
- Caliandro, R.; Guccione, P.; Nico, G.; Tutuncu, G.; Hanson, J.C. Tailored multivariate analysis for modulated enhanced diffraction. J. Appl. Crystallogr.
**2015**, 48, 1679–1691. [Google Scholar] [CrossRef] - Guccione, P.; Palin, L.; Milanesio, M.; Belviso, B.D.; Caliandro, R. Improved multivariate analysis for fast and selective monitoring of structural dynamics by in situ X-ray powder diffraction. Phys. Chem. Chem. Phys.
**2018**, 20, 2175–2187. [Google Scholar] [CrossRef] [PubMed] - Conterosito, E.; Palin, L.; Caliandro, R.; van Beek, W.; Chernyshov, D.; Milanesio, M. CO
_{2}adsorption in Y zeolite: A structural and dynamic view by a novel principal-component-analysis-assisted in situ single-crystal X-ray diffraction experiment. Acta Crystallogr. Sect. A**2019**, 75, 214–222. [Google Scholar] [CrossRef] [PubMed] - Laubach, S.; Laubach, S.; Schmidt, P.C.; Ensling, D.; Schmid, S.; Jaegermann, W.; Thißen, A.; Nikolowski, K.; Ehrenberg, H. Changes in the crystal and electronic structure of LiCoO
_{2}and LiNiO_{2}upon Li intercalation and de-intercalation. Phys. Chem. Chem. Phys.**2009**, 11, 3278–3289. [Google Scholar] [CrossRef] [PubMed] - Dovgaliuk, I.; Senkovska, I.; Xiao, L.; Dyadkin, V.; Filinchuk, Y.; Chernyshov, D. Kinetic Barriers and Microscopic Mechanism of Gas Adsorption by Sub-Second X-Ray Diffraction: Case for Kr in Nanoporous γ-Mg(BH
_{4})_{2}. Angew. Chem.**2020**. Submitted. [Google Scholar] - Caliandro, R.; Belviso, D.B. RootProf: Software for multivariate analysis of unidimensional profiles. J. Appl. Crystallogr.
**2014**, 47, 1087–1096. [Google Scholar] [CrossRef]

**Figure 1.**Loadings and scores for two main components for the model case of variation of the occupancy in Li${}_{x}$CoO${}_{2}$: (

**a**) loadings ${L}_{1}$ and ${L}_{2}$, (

**b**) first score ${S}_{1}$ and (

**c**) correlation plot for the scores, ${S}_{2}\left({S}_{1}\right)$ together with the polinomial fit.

**Figure 2.**Loadings and scores for two main components for the model case of variation of lattice dimension in Li${}_{x}$CoO${}_{2}$: (

**a**) loadings ${L}_{1}$ and ${L}_{2}$, (

**b**) first score ${S}_{1}$ and (

**c**) correlation plot for the scores, ${S}_{2}\left({S}_{1}\right)$ together with the polinomial fit.

**Figure 3.**Loadings (

**a**) and scores (

**b**,

**c**) for two main components for the model case of Kr uptake by the porous $\gamma $–Mg(BH${}_{4}$)${}_{2}$. (

**b**) shows the first score with the fit for the expected kinetics, note the difference between the refined rate ($k=0.040\left(3\right)$) and the model value ($k=0.05$). (

**c**) shows the correlation between the scores ${S}_{1}$ and ${S}_{2}$ where the line stays for the best fit of a second-order polynomial function.

**Figure 4.**Loadings (

**a**) and scores (

**b**,

**c**) for two main components for the model case of Kr uptake by the porous $\gamma $–Mg(BH${}_{4}$)${}_{2}$ after rotation corrections. The first score ${S}_{1}$ is shown togther with a fit (

**b**), note the perfect agreement between the fitted and the model rates ($k=0.05$). The correlation between scores ${S}_{1}$ and ${S}_{2}$ together with the best fit with a second-order polynomial function is shown in (

**c**).

**Figure 5.**Powder diffraction data collected as a function of time during Kr uptake by $\gamma $–Mg(BH${}_{4}$)${}_{2}$ at 170 K. Note Kr fluorescence background that gives additional measure of Kr in the irradiated volume.

**Figure 6.**First loadings ${L}_{1}$ (

**a**) and scores ${S}_{1}$ (

**b**) after PCA for the experimental data for Kr uptake by the porous magnesium borohydrate $\gamma $–Mg(BH${}_{4}$)${}_{2}$ for subsets A (first 1000 powder patterns), B (1000 patterns with every second pattern), C (1000 patterns with every fourth pattern).

**Figure 7.**PCA of the experimental data, subset A, for Kr uptake by $\gamma $–Mg(BH${}_{4}$)${}_{2}$ after the rotational correction (see text). (

**a**) Loadings, corrected for the rotation, bottom shows and overlay of the second loading (black circles) together with the diffraction pattern from the Kr sublattice alone (red line). (

**b**) shows the first score with together with Kr occupancy. (

**c**) shows the correlation between the scores with line for a best fit with a second order polynomial function.

**Figure 8.**First loadings ${L}_{1}$ (

**a**) and scores ${S}_{1}$ (

**b**) corrected for the rotation for the experimental data for Kr uptake by the porous magnesium borohydrate $\gamma $–Mg(BH${}_{4}$)${}_{2}$ for the subsets A (first 1000 powder patterns), B (1000 patterns with every second pattern), C (1000 patterns with every fourth pattern).

**Table 1.**Time evolution functions and intensity distributions for components in Equation (7).

n | ${\mathit{S}}_{\mathit{n}}\left(\mathit{t}\right)$ | ${\mathit{I}}_{\mathit{n}}\left(\mathit{d}\right)$ |
---|---|---|

1 | ${S}_{d}\left(t\right)$ | ${\sum}_{\mathbf{H}}{G}^{\prime}\left(d-\overline{{d}_{\mathbf{H}}}\right){\overline{F\left(\mathbf{H}\right)}}^{2}\chi \left(\mathbf{H}\right)$ |

2 | ${S}_{f}\left(t\right)$ | $2{\sum}_{\mathbf{H}}G\left(d-\overline{{d}_{\mathbf{H}}}\right)\overline{F\left(\mathbf{H}\right)}\eta \left(\mathbf{H}\right)$ |

3 | ${S}_{f}\left(t\right){S}_{d}\left(t\right)$ | $2{\sum}_{\mathbf{H}}{G}^{\prime}\left(d-\overline{{d}_{\mathbf{H}}}\right)\overline{F\left(\mathbf{H}\right)}\eta \left(\mathbf{H}\right)\chi \left(\mathbf{H}\right)$ |

4 | ${S}_{d}^{2}\left(t\right)$ | ${\sum}_{\mathbf{H}}{G}^{\u2033}\left(d-\overline{{d}_{\mathbf{H}}}\right){\overline{F\left(\mathbf{H}\right)}}^{2}{\chi}^{2}\left(\mathbf{H}\right)$ |

5 | ${S}_{f}^{2}\left(t\right)$ | ${\sum}_{\mathbf{H}}G\left(d-\overline{{d}_{\mathbf{H}}}\right){\eta}^{2}\left(\mathbf{H}\right)$ |

6 | ${S}_{f}^{2}\left(t\right){S}_{d}\left(t\right)$ | ${\sum}_{\mathbf{H}}{G}^{\prime}\left(d-\overline{{d}_{\mathbf{H}}}\right){\eta}^{2}\left(\mathbf{H}\right)\chi \left(\mathbf{H}\right)$ |

7 | ${S}_{f}\left(t\right){S}_{d}^{2}\left(t\right)$ | 2 ${\sum}_{\mathbf{H}}{G}^{\u2033}\left(d-\overline{{d}_{\mathbf{H}}}\right)\overline{F\left(\mathbf{H}\right)}\eta \left(\mathbf{H}\right){\chi}^{2}\left(\mathbf{H}\right)$ |

8 | ${S}_{f}^{2}\left(t\right){S}_{d}^{2}\left(t\right)$ | ${\sum}_{\mathbf{H}}{G}^{\u2033}\left(d-\overline{{d}_{\mathbf{H}}}\right){\eta}^{2}\left(\mathbf{H}\right){\chi}^{2}\left(\mathbf{H}\right)$ |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Chernyshov, D.; Dovgaliuk, I.; Dyadkin, V.; van Beek, W.
Principal Component Analysis (PCA) for Powder Diffraction Data: Towards Unblinded Applications. *Crystals* **2020**, *10*, 581.
https://doi.org/10.3390/cryst10070581

**AMA Style**

Chernyshov D, Dovgaliuk I, Dyadkin V, van Beek W.
Principal Component Analysis (PCA) for Powder Diffraction Data: Towards Unblinded Applications. *Crystals*. 2020; 10(7):581.
https://doi.org/10.3390/cryst10070581

**Chicago/Turabian Style**

Chernyshov, Dmitry, Iurii Dovgaliuk, Vadim Dyadkin, and Wouter van Beek.
2020. "Principal Component Analysis (PCA) for Powder Diffraction Data: Towards Unblinded Applications" *Crystals* 10, no. 7: 581.
https://doi.org/10.3390/cryst10070581