# Improvements in the Robustness of Mid-Infrared Spectroscopy Models against Chemical Interferences: Application to Monitoring of Anaerobic Digestion Processes

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

_{2}) and its most reduced form (CH

_{4}) [16]. The methane produced is a versatile energy carrier that can be used for generating power and heat, injecting into the gas grid, or use as a fuel for vehicles. In Europe, incentive policies have led to the emergence of a biogas industry, mainly based on agricultural feedstocks (agricultural residues, energy crops, and catch crops) which are the largest available deposits of organic matter [17,18]. However, AD processes are very sensitive, and they can face a large number of disturbances due the fluctuating characteristics and quantities of the waste they handle. Thus, the monitoring of AD processes is an imperative task in order to ensure optimized operations and to prevent failures and serious consequences during the running of plants [19].

## 2. Materials and Methods

#### 2.1. Experimental Procedure

#### 2.1.1. Process

^{3}[22]. The reactor was highly instrumented, with the following measurements available online every two minutes: input and recirculation liquid flow rates, pH of the reactor and of the input wastewater, temperature, biogas output flow rate, CO

_{2}, CH

_{4}and H

_{2}composition in the gas phase, and total organic carbon (TOC) in the reactor. Every half hour, measurements were taken using a titrimetric sensor and a MIR spectrometer [23] to measure: total volatile fatty acids (VFA), soluble chemical oxygen demand (COD), bicarbonate concentrations, and total and partial alkalinity in the liquid phase. Since VFA are very important intermediates in the AD reaction scheme and can inhibit the overall process, their measurement was considered in this study, together with their robustness to the ammonia addition.

#### 2.1.2. Addition of Ammonia in the Reactor

#### 2.1.3. MIR Spectra Collected on the Bioreactor

^{−1}. The fingerprint range (1572–1000 cm

^{−1}) was kept, corresponding to 150 wavenumbers. A total of 616 spectra and corresponding VFA concentrations (mg/L) were measured online on equally spaced instants, over the course of 15 days.

#### 2.2. Chemometrics

#### 2.2.1. Notations

**X**; small bold characters for column vectors, e.g.,

**x**

_{j}denote the jth column of

**X**; row vectors are denoted by the transpose notation, e.g., ${x}_{i}^{T}$ denotes the ith row of

**X**; non-bold characters are used for scalars, e.g., indexes, as i. When needed for clarity purposes, matrix dimensions are indicated as ${X}_{\left(n,p\right)}$, where n is the number of rows and p the number of columns.

- A calibration set $\left({X}_{0},{y}_{0}\right)$ made up of 100 spectra and VFA concentrations obtained during a sequence where the process was in a standard mode, i.e., between approx. days 9 and 13 (see Table 1 and Figure 1). This period corresponded to the restart of the reactor, so that the
**y**values covered a wide range of VFA concentrations._{0} - A test set $\left({X}_{1},{y}_{1}\right)$ was made up of all the acquired samples, containing 616 couples of spectra and VFA concentrations. This test set included various states of functioning, including normal states and ammonia-addition events.

**y**by a linear combination of the columns of

**X**: $\widehat{y}=Xb+r$. Since columns of

**X**are highly correlated, an ordinary least-squares regression cannot be used. Instead, PLSR proceeds by first calculating a latent variables, which are linear combinations $\left\{{u}_{1},{u}_{2},\dots ,{u}_{a}\right\}$ of the columns of

**X**, which maximize $co{v}^{2}\left(X{u}_{i},y\right)$ with the constraint $X{u}_{i}\perp X{u}_{j},fori\ne j$. Next, a linear regression is calculated between $\left(X{u}_{i}\right)\mathrm{and}y.$ Subsequently, the vector

**b**is obtained by merging the latent variable loadings

**U**and the regression coefficients of the linear regression. To set the number of latent variables a, a 2-random-blocks cross-validation was performed and repeated 10 times. The value of a was set as the one yielding the lower RMSECV. The raw PLSR model was thus built on 5 latent variables and yielded the predictions reported in Figure 1.

#### 2.2.2. Dynamic Orthogonal Projection

**x**, an error $\delta \widehat{y}=\delta {x}^{T}b$ is added to the estimation of the response $\widehat{y}$. Orthogonal projection (OP) methods aim to reduce this error by constraining the PLSR to build a

**b**orthogonal to $\delta x$, thus lowering $\delta \widehat{y}$. OP methods start by identifying the subspace spanned by the $\delta x$ (detrimental subspace). An orthonormal basis ${P}_{\left(pk\right)}$ of this subspace is calculated, and then ${X}_{\left(np\right)}$ is projected onto the subspace orthogonal to

**P**, resulting in the corrected spectra:

**X**, i.e., it removes a spectral subspace from the space spanned by

**X**. The estimation of

**P**is the key to orthogonal correction. This estimation commonly uses a set of dedicated experiments, incorporating systematic variations. Let ${D}_{\left(mp\right)}$, be a matrix of m spectra generated by the detrimental variations only. A PCA is carried out on

**D**and the first k loadings are inserted in

**P**. Different methods are used to build the

**D**matrix. In the IIR method [28], the matrix

**D**is built with samples that do not contain the component of interest but include variations due to the influencing factors. Its main disadvantage is that it requires two matrices to correct the spectra, which are often not available. In the EPO method [12], the matrix

**D**is built using a small set of appropriate samples measured at different levels of the influence factor. This method does not require the

**y**reference measurements. However, only known influence factors (temperature, scale, instruments, etc.) can be considered, while unknown factors cannot.

**y**is known. Let $\left({X}_{0},{y}_{0}\right)$ be a calibration set containing ${n}_{0}$ samples. Let ${y}_{r}$ be the values of

**y**measured at n

_{r}control points and ${X}_{r}$ the corresponding acquired spectra. The DOP method runs as follows:

- First, estimate the ideal spectra ${Z}_{r}$, which should be measured in the absence of influencing factors. This produces two matrices $\left({X}_{r},{Z}_{r}\right){}_{}$ similar to those measured in the case of standard samples used for calibration transfer or calibration qualification test, except that these samples are rarely available for online applications.
- Second, compute the
**D**matrix as the difference between the measured spectra and the ideal spectra:$$D={X}_{r}-{Z}_{r}$$ - Third, extract the k first loadings of a PCA computed on
**D**and insert them in**P**. - Project ${X}_{0}$ orthogonal to
**P**and recalibrate the model.

**y**that estimates

_{0}**y**and then to applying it to

_{r}**X**:

_{0}- Calculate
**A**so that:$${y}_{r}\approx A{y}_{0}$$ - Apply
**A**on ${X}_{0}$:$${Z}_{r}=A{X}_{0}$$

**A**can be performed in different ways. The kNN (nearest-neighbors) method consists of calculating the distance between the sample to be estimated and all the samples from the calibration database to select the

**k**nearest ones used for estimation. Different variants of this method can be used, such as giving more weight to the closest samples. This method is commonly used in classification. This is a simple method that makes no statistical assumption about the normality of distribution. The kernel methods generalize the kNN; they use a kernel function to estimate the density function of a population, which can then be used to weigh an estimate. Different kernel functions can be used: uniform, triangular, Gaussian, etc. Gaussian kernels are commonly used.

**A**matrix was calculated as follows:

**y**and $\mathit{\epsilon}$ is a parameter adjusting the width of the kernel. In [20], it is theoretically justified that a good value for $\mathit{\epsilon}$ is 1/

_{0}**n**

_{0}, if

**y**is uniformly distributed. It was verified that this value is adapted to most distributions of

_{0}**y**

**and it was adopted as a rule of thumb. The same approach was therefore applied in this study.**

_{0}#### 2.2.3. Figures of Merit

- Root Mean Square Error of Calibration:$$RMSEC=\sqrt{\frac{\sum {\left({\widehat{\mathit{y}}}_{\mathit{i}}-{\mathit{y}}_{\mathit{i}}\right)}^{\mathbf{2}}}{\mathit{N}-\mathit{L}\mathit{V}-\mathbf{1}}}$$
- Root Mean Square Error of Cross-Validation:$$\mathit{R}\mathit{M}\mathit{S}\mathit{E}\mathit{C}\mathit{V}=\sqrt{\frac{\sum \text{}{\left({\widehat{y}}_{i}-{y}_{i}\right)}^{2}}{N}}$$
- Root Mean Square Error of Prediction:$$RMSEP=\sqrt{\frac{\sum \text{}{\left({\widehat{y}}_{i}-{y}_{i}\right)}^{2}}{N}}$$
- Bias:$$Bias=\frac{\sum {\left({\widehat{y}}_{i}-{y}_{i}\right)}^{}}{N}$$

## 3. Results and Discussion

^{−1}is reported as being linked to the symmetric deformation mode ν4 of NH

_{4}

^{+}. This peak corresponds to the main peak appearing in Figure 2 at each ammonia addition.

^{−1}. In Figure 4b, we can observe that the three spectra are properly ordered, in a proportional scale, over the whole wavenumber range after the DOP correction. More precisely, the three spectra coincide over a wide range, from 1000 to 1350 cm

^{−1}and separate in two peaks in the zone of 1350–1700 cm

^{−1}. It is clear from these figures that the DOP removed all the features that were affected by the ammonia, thus producing very meaningful corrected spectra.

## 4. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Wang, L.; Sun, D.W.; Pu, H.; Cheng, J.H. Quality analysis, classification, and authentication of liquid foods by near-infrared spectroscopy: A review of recent research developments. Crit. Rev. Food Sci. Nutr.
**2017**, 57, 1524–1538. [Google Scholar] [CrossRef] [PubMed] - Karoui, R.; Downey, G.; Blecker, C. Mid-infrared spectroscopy coupled with chemometrics: A tool for the analysis of intact food systems and the exploration of their molecular Structure–Quality relationships—A review. Chem. Rev.
**2010**, 110, 6144–6168. [Google Scholar] [CrossRef] [PubMed] - Kornmann, H.; Valentinotti, S.; Duboc, P.; Marison, I.; Von Stockar, U. Monitoring and control of Gluconacetobacter xylinus fed-batch cultures using in situ mid-IR spectroscopy. J. Biotechnol.
**2004**, 113, 231–245. [Google Scholar] [CrossRef] [PubMed] - Amrhein, M. Reaction and Flow Variants/Invariants for the Analysis of Chemical Reaction Data (No. THESIS); EPFL: Lausanne, Switzerland, 1998. [Google Scholar]
- Amrhein, M.; Srinivasan, B.; Bonvin, D.; Schumacher, M.M. Calibration of spectral reaction data. Chemom. Intell. Lab. Syst.
**1999**, 46, 249–264. [Google Scholar] [CrossRef] - Chung, H.; Arnold, M.A.; Rhiel, M.; Murhammer, D.W. Simultaneous measurements of glucose, glutamine, ammonia, lactate, and glutamate in aqueous solutions by near-infrared spectroscopy. Appl. Spectrosc.
**1996**, 50, 270–276. [Google Scholar] [CrossRef] - Rhiel, M.; Cannizzaro, C.; Valentinotti, S.; Marison, I.; von Stockar, U. Comprehensive In-Situ Bioreactor Monitoring and Control Based on a Mid-Infrared Spectroscopic Sensor System; EPFL Scientific Publications: Lausanne, Switzerland, 2000. [Google Scholar]
- Brown, S.D.; Huwei, T.; Feudale, R. Improving the robustness of multivariate calibrations. In ACS Symposium Series; Oxford University Press: Oxford, UK, 2005; Volume 894, pp. 15–30. [Google Scholar]
- Dabros, M. Robust Model Development and Enhancement Techniques for Improved On-Line Spectroscopic Monitoring of Bioprocesses. Ph.D. Thesis, EPFL Scientific Publications, Lausanne, Switzerland, 2008. [Google Scholar]
- Zeaiter, M.; Roger, J.M.; Bellon-Maurel, V.; Rutledge, D.N. Robustness of models developed by multivariate calibration. Part I: The assessment of robustness. TrAC Trends Anal. Chem.
**2004**, 23, 157–170. [Google Scholar] [CrossRef] - Zeaiter, M.; Roger, J.M.; Bellon-Maurel, V. Robustness of models developed by multivariate calibration. Part II: The influence of pre-processing methods. TrAC Trends Anal. Chem.
**2005**, 24, 437–445. [Google Scholar] [CrossRef] - Roger, J.M.; Chauchard, F.; Bellon-Maurel, V. EPO–PLS external parameter orthogonalisation of PLS application to temperature-independent measurement of sugar content of intact fruits. Chemom. Intell. Lab. Syst.
**2003**, 66, 191–204. [Google Scholar] [CrossRef] [Green Version] - Chauchard, F.; Roger, J.M.; Bellon-Maurel, V. Correction of the temperature effect on near infrared calibration—Application to soluble solid content prediction. J. Near Infrared Spectrosc.
**2004**, 12, 199–205. [Google Scholar] [CrossRef] - Boulet, J.C.; Roger, J.M. Pretreatments by means of orthogonal projections. Chemom. Intell. Lab. Syst.
**2012**, 117, 61–69. [Google Scholar] [CrossRef] [Green Version] - Roger, J.M.; Boulet, J.C. A review of orthogonal projections for calibration. J. Chemom.
**2018**, 32, e3045. [Google Scholar] [CrossRef] - Brémond, U.; de Buyer, R.; Steyer, J.P.; Bernet, N.; Carrere, H. Biological pretreatments of biomass for improving biogas production: An overview from lab scale to full-scale. Renew. Sustain. Energy Rev.
**2018**, 90, 583–604. [Google Scholar] [CrossRef] - Weiland, P. Biogas production: Current state and perspectives. Appl. Microbiol. Biotechnol.
**2010**, 85, 849–860. [Google Scholar] [CrossRef] [PubMed] - Brémond, U.; Bertrandias, A.; Steyer, J.P.; Bernet, N.; Carrere, H. A vision of European biogas sector development towards 2030: Trends and challenges. J. Clean. Prod.
**2021**, 287, 125065. [Google Scholar] [CrossRef] - Jimenez, J.; Latrille, E.; Harmand, J.; Robles, A.; Ferrer, J.; Gaida, D.; Wolf, C.; Mairet, F.; Bernard, O.; Alcaraz-Gonzalez, V.; et al. Instrumentation and control of anaerobic digestion processes: A review and some research challenges. Rev. Environ. Sci. Bio/Technol.
**2015**, 14, 615–648. [Google Scholar] [CrossRef] - Zeaiter, M.; Roger, J.M.; Bellon-Maurel, V. Dynamic orthogonal projection. A new method to maintain the on-line robustness of multivariate calibrations. Application to NIR-based monitoring of wine fermentations. Chemom. Intell. Lab. Syst.
**2006**, 80, 227–235. [Google Scholar] [CrossRef] - Batstone, D.J.; Steyer, J.P. Use of modelling to evaluate best control practice for winery-type wastewaters. Water Sci. Technol.
**2007**, 56, 147–152. [Google Scholar] [CrossRef] - Steyer, J.P.; Bouvier, J.C.; Conte, T.; Gras, P.; Sousbie, P. Evaluation of a four year experience with a fully instrumented anaerobic digestion process. Water Sci. Technol.
**2002**, 45, 495–502. [Google Scholar] [CrossRef] - Steyer, J.P.; Bouvier, J.C.; Conte, T.; Gras, P.; Harmand, J.; Delgenes, J.P. On-line measurements of COD, TOC, VFA, total and partial alkalinity in anaerobic digestion processes using infra-red spectrometry. Water Sci. Technol.
**2002**, 45, 133–138. [Google Scholar] [CrossRef] - Koster, I.W.; Lettinga, G. The influence of ammonium-nitrogen on the specific activity of pelletized methanogenic sludge. Agric. Wastes
**1984**, 9, 205–216. [Google Scholar] [CrossRef] - Koster, I.W.; Lettinga, G. Anaerobic digestion at extreme ammonia concentrations. Biol. Wastes
**1988**, 25, 51–59. [Google Scholar] [CrossRef] - Hansen, K.H.; Angelidaki, I.; Ahring, B.K. Anaerobic digestion of swine manure: Inhibition by ammonia. Water Res.
**1998**, 32, 5–12. [Google Scholar] [CrossRef] - Geladi, P.; Kowalski, B.R. Partial least-squares regression: A tutorial. Anal. Chim. Acta
**1986**, 185, 1–17. [Google Scholar] [CrossRef] - Hansen, P.W. Pre-processing method minimizing the need for reference analyses. J. Chemom. A J. Chemom. Soc.
**2001**, 15, 123–131. [Google Scholar] [CrossRef] - Falk, H.M.; Reichling, P.; Andersen, C.; Benz, R. Online monitoring of concentration and dynamics of volatile fatty acids in anaerobic digestion processes with mid-infrared spectroscopy. Bioprocess Biosyst. Eng.
**2015**, 38, 237–249. [Google Scholar] [CrossRef] - Wolf, C.; Gaida, D.; Rehorek, A.; Bongards, M. Online monitoring of AD processes using a fully automated, low maintenance middle-infrared (MIR) measurement system. In Proceeding of the 2nd International Conference Biogas Science 2014, Vienna, Austria, 26–30 October 2014; Volume 2014, pp. 24–25. [Google Scholar]
- Max, J.J.; Chapados, C. Aqueous ammonia and ammonium chloride hydrates: Principal infrared spectra. J. Mol. Struct.
**2013**, 1046, 124–135. [Google Scholar] [CrossRef] - de Noord, O.E. The influence of data preprocessing on the robustness and parsimony of multivariate calibration models. Chemom. Intell. Lab. Syst.
**1994**, 23, 65–70. [Google Scholar] [CrossRef] - Challa, S.; Potumarthi, R. Chemometrics-based process analytical technology (PAT) tools: Applications and adaptation in pharmaceutical and biopharmaceutical industries. Appl. Biochem. Biotechnol.
**2013**, 169, 66–76. [Google Scholar] [CrossRef]

**Figure 1.**VFA concentration evolution measured by the reference titrimetric sensor (blue) and predicted by the raw PLSR (red). Vertical lines indicate the main events (Cf Table 1). Horizontal arrows indicate the measurements used for calibration and testing of the PLSR model.

**Figure 3.**Prediction of the raw PLSR model (red) and of the DOP PLSR model (blue) after correction with DOP at point 1 after first addition of ammonia and point 2 after the second, more concentrated, addition of ammonia. Points A, B, and C correspond to low VFA/low ammonia; medium VFA/low ammonia; and high VFA/high ammonia. Dotted vertical lines show the moments of ammonia addition (cf. Figure 1 and Table 1). Dashed vertical lines show the control points used by DOP.

**Figure 4.**Illustration of the DOP correction. Spectra of the three points, A, B, and C, shown in Figure 3, corresponding to VFA = 623, 1476, and 2060 mg/L. (

**a**) Before DOP correction. (

**b**) After DOP correction.

**Figure 5.**Summaries of RMSEP obtained after DOP correction, as a function of the number of control points used. Each boxplot represents the distribution of the RMSEP values obtained after 100 repetitions with randomly chosen control points. Horizontal red line indicates the median, while the edges of the boxes are the first and third quartiles. Dashed whiskers extend to the most extreme data not considered as outliers. Horizontal dashed line indicates the RMSEP of the raw PLSR.

Time (Days) | Addition of Chemicals | Impact on the Predictions |
---|---|---|

0 | Addition of NH_{4}Cl (low conc.) | Moderate negative bias |

4.0 | Addition of NaOH | Weak positive bias |

5.7 | Addition of NH_{4}Cl (medium conc.) | Strong negative bias |

9.3 | Discharge and water circulation | None |

10.0 | Re-charge with wine waste | None |

11.5 | Stopping of the reactor | None |

13.0 | Restart of the reactor | None |

13.7 | Addition of NH_{4}Cl (high conc.) | Strong negative bias |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Zeaiter, M.; Latrille, É.; Gras, P.; Steyer, J.-P.; Bellon-Maurel, V.; Roger, J.-M.
Improvements in the Robustness of Mid-Infrared Spectroscopy Models against Chemical Interferences: Application to Monitoring of Anaerobic Digestion Processes. *AppliedChem* **2022**, *2*, 117-127.
https://doi.org/10.3390/appliedchem2020008

**AMA Style**

Zeaiter M, Latrille É, Gras P, Steyer J-P, Bellon-Maurel V, Roger J-M.
Improvements in the Robustness of Mid-Infrared Spectroscopy Models against Chemical Interferences: Application to Monitoring of Anaerobic Digestion Processes. *AppliedChem*. 2022; 2(2):117-127.
https://doi.org/10.3390/appliedchem2020008

**Chicago/Turabian Style**

Zeaiter, Magida, Éric Latrille, Pascal Gras, Jean-Philippe Steyer, Véronique Bellon-Maurel, and Jean-Michel Roger.
2022. "Improvements in the Robustness of Mid-Infrared Spectroscopy Models against Chemical Interferences: Application to Monitoring of Anaerobic Digestion Processes" *AppliedChem* 2, no. 2: 117-127.
https://doi.org/10.3390/appliedchem2020008