# Bioinformatic Analysis of Metabolomic Data: From Raw Spectra to Biological Insight

^{1}

^{2}

^{2}SysBio, University of Valencia-FISABIO Joint Unit, 46980 Paterna, Spain

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Raw Spectra Preprocessing

#### 2.1. Denoising

#### 2.2. Peak Picking

#### 2.3. Peak Alignment

#### 2.4. Compound Identification

## 3. Peak Area Preprocessing

#### 3.1. Missing Value Imputation

#### 3.2. Normalization

#### 3.2.1. Normalization Methods

#### 3.2.2. Assessment of Post-Acquisition Normalization Effectivity

## 4. Statistical Analysis of Metabolomic Data

#### 4.1. Univariate Analyses

#### 4.2. Multivariate Analyses

#### 4.2.1. Non-Supervised Multivariate Analyses

_{k}(u

_{1}) from CDF

_{k}(u

_{2}), and the u

_{1}and u

_{2}commonly used are 0.1 and 0.9, respectively. With both metrics, the ‘elbow’ method is the most commonly used to select the best k. This approach is, however, rather subjective, so different alternatives have been proposed in order to determine the number of significant clusters more objectively. An example is the M3C method, which computes a p-value based on a null distribution obtained by applying consensus clustering on randomly generated datasets, which have the same feature correlation structure as the original dataset, but do not present clustering [91].

**Figure 2.**Graphical outputs of a selection of unsupervised multivariate analyses. These plots were obtained using the metabolomic dataset included in the work by Boyle et al. from 2017 [92], obtained from different P. aeruginosa mutants. (

**A**,

**B**) Principal component analysis (PCA). (

**A**) PCA biplot, with the loadings of the top eight metabolites that most contribute to the two first components overlayed. The colors of the points indicate the mutant. (

**B**) Scree plot of the four first components, showing the proportion of the total variance of the data explained by each one of them. (

**C**) Heatmap of two hierarchical clustering analysis samples, metabolite-wise. The color gradient indicates metabolite abundance. Each column corresponds to one P. aeruginosa mutant.

#### 4.2.2. Supervised Multivariate Approaches

^{2}and Q

^{2}can be used for this aim. R

^{2}, as in linear regression, indicates the proportion of the variation in the response explained by the model. Q

^{2}refers to the R

^{2}obtained with the testing set. If the model is not overfitting, R

^{2}should be high, and Q

^{2}should be slightly smaller than R

^{2}, but not very different. However, because of the time-consuming nature of metabolomic sample acquisition, the number of samples is not always high enough to be able to split the dataset and still have an appropriate number of samples. An alternative to this approach is to use label permutation and cross validation [103]. In cross validation, the dataset is split into several sample subsets, the model is fitted to all the combined subsets except for one, which is tested in the left-out subset. This process is repeated until all the subsets have been used as a testing set. This allows us to have R

^{2}and Q

^{2}values while keeping all the samples for the analysis. In the permutation test, the responses variables are randomly permutated between samples. A model is then fitted to the altered dataset, and R

^{2}and Q

^{2}are computed, comparing the values to the ones of the actual model. If the model is overfitting, the permutated model might have a higher R

^{2}or Q

^{2}just by chance. Several permutation rounds are carried out, obtaining a p-value based on the proportion of R

^{2}of permutated sets higher than the actual R

^{2}. Another p-value is obtained analogously for Q

^{2}. Some useful graphical representations for OPLS and OPLS-DA models are a score plot of the predictive and first orthogonal components (Figure 3A), a scatterplot of the R

^{2}and Q

^{2}values obtained before and after label permutation (Figure 3B) and a bar plot of each metabolite’s loading for the predictor component (Figure 3C). These plots allow us to visualize how well the model is separating the samples, if the model is statistically significant and the contribution of each metabolite to the separation of the samples according to the response variable, respectively.

## 5. Metabolic Pathway Enrichment

#### 5.1. Over-Representation Analysis (ORA)

#### 5.2. Functional Class Scoring (FCS)

#### 5.3. Pathway Topology (PT)

^{2}test on a lower dimension space built from the graph to determine the significances of subnetworks (pathways) within the graph [126].

## 6. Generating Insight When Metabolomic Data Are Not Available: Genome-Scale Metabolic Models

^{−1}) [130]. The obtained metabolic network will contain some gaps due to incompleteness and mistakes in the annotation and promiscuous enzymes that catalyze reactions that are not accounted for, which will need to be filled. This step is preferred to be carried out manually [131].

## 7. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Oliver, S.G.; Winson, M.K.; Kell, D.B.; Baganz, F. Systematic Functional Analysis of the Yeast Genome. Trends Biotechnol.
**1998**, 16, 373–378. [Google Scholar] [CrossRef] [PubMed] - Fiehn, O. Metabolomics—The Link between Genotypes and Phenotypes. Plant Mol. Biol.
**2002**, 48, 155–171. [Google Scholar] [CrossRef] [PubMed] - Marian, A.J. Molecular Genetic Studies of Complex Phenotypes. Transl. Res.
**2012**, 159, 64–79. [Google Scholar] [CrossRef] [PubMed] - Zulianello, L.; Canard, C.; Köhler, T.; Caille, D.; Lacroix, J.S.; Meda, P. Rhamnolipids Are Virulence Factors That Promote Early Infiltration of Primary Human Airway Epithelia by Pseudomonas aeruginosa. Infect. Immun.
**2006**, 74, 3134–3147. [Google Scholar] [CrossRef] [PubMed] - Davey, M.E.; Caiazza, N.C.; O’Toole, G.A. Rhamnolipid Surfactant Production Affects Biofilm Architecture in Pseudomonas aeruginosa PAO1. J. Bacteriol.
**2003**, 185, 1027–1036. [Google Scholar] [CrossRef] [PubMed] - Caiazza, N.C.; Shanks, R.M.Q.; O’Toole, G.A. Rhamnolipids Modulate Swarming Motility Patterns of Pseudomonas aeruginosa. J. Bacteriol.
**2005**, 187, 7351–7361. [Google Scholar] [CrossRef] [PubMed] - Sabra, W.; Kim, E.J.; Zeng, A.P. Physiological Responses of Pseudomonas aeruginosa PAO1 to Oxidative Stress in Controlled Microaerobic and Aerobic Cultures. Microbiology
**2002**, 148, 3195–3202. [Google Scholar] [CrossRef] [PubMed] - Mukhopadhyay, S.; Nair, S.; Ghosh, S. Pathogenesis in Tuberculosis: Transcriptomic Approaches to Unraveling Virulence Mechanisms and Finding New Drug Targets. FEMS Microbiol. Rev.
**2012**, 36, 463–485. [Google Scholar] [CrossRef] [PubMed] - Galagan, J.E.; Minch, K.; Peterson, M.; Lyubetskaya, A.; Azizi, E.; Sweet, L.; Gomes, A.; Rustad, T.; Dolganov, G.; Glotova, I.; et al. The Mycobacterium tuberculosis Regulatory Network and Hypoxia. Nature
**2013**, 499, 178–183. [Google Scholar] [CrossRef] - Raghunandanan, S.; Jose, L.; Gopinath, V.; Kumar, R.A. Comparative Label-Free Lipidomic Analysis of Mycobacterium tuberculosis during Dormancy and Reactivation. Sci. Rep.
**2019**, 9, 3660. [Google Scholar] [CrossRef] - Ye, D.; Li, X.; Shen, J.; Xia, X. Microbial Metabolomics: From Novel Technologies to Diversified Applications. TrAC-Trends Anal. Chem.
**2022**, 148, 116540. [Google Scholar] [CrossRef] - Emwas, A.H.; Roy, R.; McKay, R.T.; Tenori, L.; Saccenti, E.; Nagana Gowda, G.A.; Raftery, D.; Alahmari, F.; Jaremko, L.; Jaremko, M.; et al. Nmr Spectroscopy for Metabolomics Research. Metabolites
**2019**, 9, 123. [Google Scholar] [CrossRef] [PubMed] - Lu, H.; Liang, Y.; Dunn, W.B.; Shen, H.; Kell, D.B. Comparative Evaluation of Software for Deconvolution of Metabolomics Data Based on GC-TOF-MS. TrAC-Trends Anal. Chem.
**2008**, 27, 215–227. [Google Scholar] [CrossRef] - Fiehn, O. Metabolomics by Gas Chromatography-Mass Spectrometry: The Combination of Targeted and Untargeted Profiling. Curr. Protoc. Mol. Biol.
**2017**, 114, 30.4.1–30.4.32. [Google Scholar] [CrossRef] [PubMed] - Perez, E.R.; Knapp, J.A.; Horn, C.K.; Stillman, S.L.; Evans, J.E.; Arfsten, D.P. Comparison of LC-MS-MS and GC-MS Analysis of Benzodiazepine Compounds Included in the Drug Demand Reduction Urinalysis Program. J. Anal. Toxicol.
**2016**, 40, 201–207. [Google Scholar] [CrossRef] [PubMed] - Chen, C.; Gonzalez, F.J.; Idle, J.R. LC-MS-Based Metabolomics in Drug Metabolism. Drug Metab. Rev.
**2007**, 39, 581–597. [Google Scholar] [CrossRef] - Johnson, C.H.; Ivanisevic, J.; Benton, H.P.; Siuzdak, G. Bioinformatics: The next Frontier of Metabolomics. Anal. Chem.
**2015**, 87, 147–156. [Google Scholar] [CrossRef] [PubMed] - Emwas, A.H.M. The Strengths and Weaknesses of NMR Spectroscopy and Mass Spectrometry with Particular Focus on Metabolomics Research. In Metabonomics: Methods and Protocols; Bjerrum, J.T., Ed.; Humana Press: New York, NY, USA, 2015; Volume 1277, pp. 161–194. [Google Scholar]
- Edison, A.S.; Colonna, M.; Gouveia, G.J.; Holderman, N.R.; Judge, M.T.; Shen, X.; Zhang, S. NMR: Unique Strengths That Enhance Modern Metabolomics Research. Anal. Chem.
**2021**, 93, 478–499. [Google Scholar] [CrossRef] [PubMed] - Karaman, I.; Climaco Pinto, R.; Graça, G. Chapter 8—Metabolomics Data Preprocessing: From Raw Data to Features for Statistical Analysis. In Comprehensive Analytical Chemistry; Jaumot, J., Bedia, C., Tauler, R., Eds.; Elsevier: Amsterdam, The Netherlands, 2018; Volume 82, ISBN 9780444640444. [Google Scholar]
- Alonso, A.; Marsal, S.; Julià, A. Analytical Methods in Untargeted Metabolomics: State of the Art in 2015. Front. Bioeng. Biotechnol.
**2015**, 3, 23. [Google Scholar] [CrossRef] - Pang, Z.; Chong, J.; Zhou, G.; De Lima Morais, D.A.; Chang, L.; Barrette, M.; Gauthier, C.; Jacques, P.É.; Li, S.; Xia, J. MetaboAnalyst 5.0: Narrowing the Gap between Raw Spectra and Functional Insights. Nucleic Acids Res.
**2021**, 49, W388–W396. [Google Scholar] [CrossRef] - Delaglio, F.; Grzesiek, S.; Vuister, G.W.; Zhu, G.; Pfeifer, J.; Bax, A. NMRPipe: A Multidimensional Spectral Processing System Based on UNIX Pipes. J. Biomol. NMR
**1995**, 6, 277–293. [Google Scholar] [CrossRef] [PubMed] - van Beek, J.D. MatNMR: A Flexible Toolbox for Processing, Analyzing and Visualizing Magnetic Resonance Data in Matlab
^{®}. J. Magn. Reson.**2007**, 187, 19–26. [Google Scholar] [CrossRef] [PubMed] - Chong, J.; Yamamoto, M.; Xia, J. MetaboAnalystR 2.0: From Raw Spectra to Biological Insights. Metabolites
**2019**, 9, 57. [Google Scholar] [CrossRef] [PubMed] - Smith, C.A.; Want, E.J.; O’Maille, G.; Abagyan, R.; Siuzdak, G. XCMS: Processing Mass Spectrometry Data for Metabolite Profiling Using Nonlinear Peak Alignment, Matching, and Identification. Anal. Chem.
**2006**, 78, 779–787. [Google Scholar] [CrossRef] - Altenhof, A.R.; Mason, H.; Schurko, R.W. DESPERATE: A Python Library for Processing and Denoising NMR Spectra. J. Magn. Reson.
**2023**, 346, 107320. [Google Scholar] [CrossRef] - Qiu, T.; Wang, Z.; Liu, H.; Guo, D.; Qu, X. Review and Prospect: NMR Spectroscopy Denoising and Reconstruction with Low-Rank Hankel Matrices and Tensors. Magn. Reson. Chem.
**2020**, 59, 324–345. [Google Scholar] [CrossRef] [PubMed] - Bauer, C.; Cramer, R.; Schuchhardt, J. Evaluation of Peak-Picking Algorithms for Protein Mass Spectrometry. In Data Mining in Proteomics: From Standards to Applications. Methods in Molecular Biology; Humana Press: Totowa, NJ, USA, 2011; Volume 696, pp. 341–352. ISBN 9781607619871. [Google Scholar]
- Liu, Z.; Abbas, A.; Jing, B.Y.; Gao, X. WaVPeak: Picking NMR Peaks through Wavelet-Based Smoothing and Volume-Based Filtering. Bioinformatics
**2012**, 28, 914–920. [Google Scholar] [CrossRef] [PubMed] - Xia, J.; Psychogios, N.; Young, N.; Wishart, D.S. MetaboAnalyst: A Web Server for Metabolomic Data Analysis and Interpretation. Nucleic Acids Res.
**2009**, 37, W652–W660. [Google Scholar] [CrossRef] [PubMed] - Xi, Y.; Rocke, D.M. Baseline Correction for NMR Spectroscopic Metabolomics Data Analysis. BMC Bioinform.
**2008**, 9, 324. [Google Scholar] [CrossRef] - Li, D.W.; Hansen, A.L.; Yuan, C.; Bruschweiler-Li, L.; Brüschweiler, R. DEEP Picker Is a Deep Neural Network for Accurate Deconvolution of Complex Two-Dimensional NMR Spectra. Nat. Commun.
**2021**, 12, 5229. [Google Scholar] [CrossRef] - Bueschl, C.; Doppler, M.; Varga, E.; Seidl, B.; Flasch, M.; Warth, B.; Zanghellini, J. PeakBot: Machine-Learning-Based Chromatographic Peak Picking. Bioinformatics
**2022**, 38, 3422–3428. [Google Scholar] [CrossRef] [PubMed] - Tomasi, G.; Van Den Berg, F.; Andersson, C. Correlation Optimized Warping and Dynamic Time Warping as Preprocessing Methods for Chromatographic Data. J. Chemom.
**2004**, 18, 231–241. [Google Scholar] [CrossRef] - Nielsen, N.P.V.; Carstensen, J.M.; Smedsgaard, J. Aligning of Single and Multiple Wavelength Chromatographic Profiles for Chemometric Data Analysis Using Correlation Optimised Warping. J. Chromatogr. A
**1998**, 805, 17–35. [Google Scholar] [CrossRef] - Vu, T.N.; Laukens, K. Getting Your Peaks in Line: A Review of Alignment Methods for NMR Spectral Data. Metabolites
**2013**, 3, 259–276. [Google Scholar] [CrossRef] [PubMed] - Wishart, D.S.; Guo, A.C.; Oler, E.; Wang, F.; Anjum, A.; Peters, H.; Dizon, R.; Sayeeda, Z.; Tian, S.; Lee, B.L.; et al. HMDB 5.0: The Human Metabolome Database for 2022. Nucleic Acids Res.
**2022**, 50, D622–D631. [Google Scholar] [CrossRef] [PubMed] - Xue, J.; Guijas, C.; Benton, H.P.; Warth, B.; Siuzdak, G. METLIN MS2 Molecular Standards Database: A Broad Chemical and Biological Resource. Nat. Methods
**2020**, 17, 953–954. [Google Scholar] [CrossRef] [PubMed] - Mamede, L.; Fall, F.; Schoumacher, M.; Ledoux, A.; De Tullio, P.; Govaerts, B.; Fr, M. Comparison of Extraction Methods in Vitro Plasmodium falciparum: A
^{1}H NMR and LC-MS Joined Approach. Biochem. Biophys. Res. Commun.**2024**, 703, 149684. [Google Scholar] [CrossRef] [PubMed] - Hrydziuszko, O.; Viant, M.R. Missing Values in Mass Spectrometry Based Metabolomics: An Undervalued Step in the Data Processing Pipeline. Metabolomics
**2012**, 8, S161–S174. [Google Scholar] [CrossRef] - Barnard, J.; Meng, X.L. Applications of Multiple Imputation in Medical Studies: From AIDS to NHANES. Stat. Methods Med. Res.
**1999**, 8, 17–36. [Google Scholar] [CrossRef] - Bijlsma, S.; Bobeldijk, I.; Verheij, E.R.; Ramaker, R.; Kochhar, S.; Macdonald, I.A.; Van Ommen, B.; Smilde, A.K. Large-Scale Human Metabolomics Studies: A Strategy for Data (Pre-) Processing and Validation. Anal. Chem.
**2006**, 78, 567–574. [Google Scholar] [CrossRef] - Kokla, M.; Virtanen, J.; Kolehmainen, M.; Paananen, J.; Hanhineva, K. Random Forest-Based Imputation Outperforms Other Methods for Imputing LC-MS Metabolomics Data: A Comparative Study. BMC Bioinform.
**2019**, 20, 492. [Google Scholar] [CrossRef] - Hong, S.; Lynn, H.S. Accuracy of Random-Forest-Based Imputation of Missing Data in the Presence of Non-Normality, Non-Linearity, and Interaction. BMC Med. Res. Methodol.
**2020**, 20, 99. [Google Scholar] [CrossRef] - Hu, L.Y.; Huang, M.W.; Ke, S.W.; Tsai, C.F. The Distance Function Effect on K-Nearest Neighbor Classification for Medical Datasets. Springerplus
**2016**, 5, 1304. [Google Scholar] [CrossRef] - Kim, H.; Golub, G.H.; Park, H. Missing Value Estimation for DNA Microarray Gene Expression Data: Local Least Squares Imputation. Bioinformatics
**2005**, 21, 187–198. [Google Scholar] [CrossRef] [PubMed] - Troyanskaya, O.; Cantor, M.; Sherlock, G.; Brown, P.; Hastie, T.; Tibshirani, R.; Botstein, D.; Altman, R.B. Missing Value Estimation Methods for DNA Microarrays. Bioinformatics
**2001**, 17, 520–525. [Google Scholar] [CrossRef] - Oba, S.; Sato, M.A.; Takemasa, I.; Monden, M.; Matsubara, K.I.; Ishii, S. A Bayesian Missing Value Estimation Method for Gene Expression Profile Data. Bioinformatics
**2003**, 19, 2088–2096. [Google Scholar] [CrossRef] - Ilin, A.; Raiko, T. Practical Approaches to Principal Component Analysis in the Presence of Missing Values. J. Mach. Learn. Res.
**2010**, 11, 1957–2000. [Google Scholar] - Leek, J.T.; Scharpf, R.B.; Bravo, H.C.; Simcha, D.; Langmead, B.; Johnson, W.E.; Geman, D.; Baggerly, K.; Irizarry, R.A. Tackling the Widespread and Critical Impact of Batch Effects in High-Throughput Data. Nat. Rev. Genet.
**2010**, 11, 733–739. [Google Scholar] [CrossRef] [PubMed] - Marioni, J.C.; Mason, C.E.; Mane, S.M.; Stephens, M.; Gilad, Y. RNA-Seq: An Assessment of Technical Reproducibility and Comparison with Gene Expression Arrays. Genome Res.
**2008**, 18, 1509–1517. [Google Scholar] [CrossRef] - Karpievitch, Y.V.; Dabney, A.R.; Smith, R.D. Normalization and Missing Value Imputation for Label-Free LC-MS Analysis. BMC Bioinform.
**2012**, 13, S5. [Google Scholar] [CrossRef] - Vandesompele, J.; De Preter, K.; Pattyn, F.; Poppe, B.; Van Roy, N.; De Paepe, A.; Speleman, F. Accurate Normalization of Real-Time Quantitative RT-PCR Data by Geometric Averaging of Multiple Internal Control Genes. Rock. Mech. Rock. Eng.
**2002**, 3, research0034.1. [Google Scholar] [CrossRef] [PubMed] - Wiśniewski, J.R.; Mann, M. A Proteomics Approach to the Protein Normalization Problem: Selection of Unvarying Proteins for MS-Based Proteomics and Western Blotting. J. Proteome Res.
**2016**, 15, 2321–2326. [Google Scholar] [CrossRef] [PubMed] - Wu, Y.; Li, L. Sample Normalization Methods in Quantitative Metabolomics. J. Chromatogr. A
**2016**, 1430, 80–95. [Google Scholar] [CrossRef] [PubMed] - Chen, J.; Zhang, P.; Lv, M.; Guo, H.; Huang, Y.; Zhang, Z.; Xu, F. Influences of Normalization Method on Biomarker Discovery in Gas Chromatography-Mass Spectrometry-Based Untargeted Metabolomics: What Should Be Considered? Anal. Chem.
**2017**, 89, 5342–5348. [Google Scholar] [CrossRef] - Temmerman, L.; De Livera, A.M.; Browne, J.B.; Sheedy, J.R.; Callahan, D.L.; Nahid, A.; De Souza, D.P.; Schoofs, L.; Tull, D.L.; McConville, M.J.; et al. Cross-Platform Urine Metabolomics of Experimental Hyperglycemia in Type 2 Diabetes. J. Diabetes Metab.
**2012**, 6, 1–7. [Google Scholar] [CrossRef] - De Livera, A.M.; Sysi-Aho, M.; Jacob, L.; Gagnon-Bartsch, J.A.; Castillo, S.; Simpson, J.A.; Speed, T.P. Statistical Methods for Handling Unwanted Variation in Metabolomics Data. Anal. Chem.
**2015**, 87, 3606–3615. [Google Scholar] [CrossRef] [PubMed] - Edmands, W.M.B.; Ferrari, P.; Scalbert, A. Normalization to Specific Gravity Prior to Analysis Improves Information Recovery from High Resolution Mass Spectrometry Metabolomic Profiles of Human Urine. Anal. Chem.
**2014**, 86, 10925–10931. [Google Scholar] [CrossRef] - Marcinowska, R.; Trygg, J.; Wolf-Watz, H.; Mortiz, T.; Surowiec, I. Optimization of a Sample Preparation Method for the Metabolomic Analysis of Clinically Relevant Bacteria. J. Microbiol. Methods
**2011**, 87, 24–31. [Google Scholar] [CrossRef] [PubMed] - Chen, Y.; Shen, G.; Zhang, R.; He, J.; Zhang, Y.; Xu, J.; Yang, W.; Chen, X.; Song, Y.; Abliz, Z. Combination of Injection Volume Calibration by Creatinine and MS Signals’ Normalization to Overcome Urine Variability in LC-MS-Based Metabolomics Studies. Anal. Chem.
**2013**, 85, 7659–7665. [Google Scholar] [CrossRef] - De Livera, A.M.; Dias, D.A.; De Souza, D.; Rupasinghe, T.; Pyke, J.; Tull, D.; Roessner, U.; McConville, M.; Speed, T.P. Normalizing and Integrating Metabolomics Data. Anal. Chem.
**2012**, 84, 10768–10776. [Google Scholar] [CrossRef] - Antonelli, J.; Claggett, B.L.; Henglin, M.; Kim, A.; Ovsak, G.; Kim, N.; Deng, K.; Rao, K.; Tyagi, O.; Watrous, J.D.; et al. Statistical Workflow for Feature Selection in Human Metabolomics Data. Metabolites
**2019**, 9, 143. [Google Scholar] [CrossRef] - van den Berg, R.A.; Hoefsloot, H.C.J.; Westerhuis, J.A.; Smilde, A.K.; van der Werf, M.J. Centering, Scaling, and Transformations: Improving the Biological Information Content of Metabolomics Data. BMC Genom.
**2006**, 7, 142. [Google Scholar] [CrossRef] - De Livera, A.M.; Olshansky, G.; Simpson, J.A.; Creek, D.J. NormalizeMets: Assessing, Selecting and Implementing Statistical Methods for Normalizing Metabolomics Data. Metabolomics
**2018**, 14, 54. [Google Scholar] [CrossRef] - Sysi-Aho, M.; Katajamaa, M.; Yetukuri, L.; Orešič, M. Normalization Method for Metabolomics Data Using Optimal Selection of Multiple Internal Standards. BMC Bioinform.
**2007**, 8, 93. [Google Scholar] [CrossRef] [PubMed] - Grocholska, P.; Bachor, R. Trends in the Hydrogen−deuterium Exchange at the Carbon Centers. Preparation of Internal Standards for Quantitative Analysis by Lc-Ms. Molecules
**2021**, 26, 2989. [Google Scholar] [CrossRef] [PubMed] - Gullberg, J.; Jonsson, P.; Nordström, A.; Sjöström, M.; Moritz, T. Design of Experiments: An Efficient Strategy to Identify Factors Influencing Extraction and Derivatization of Arabidopsis thaliana Samples in Metabolomic Studies with Gas Chromatography/Mass Spectrometry. Anal. Biochem.
**2004**, 331, 283–295. [Google Scholar] [CrossRef] - Liu, R.H.; Lin, D.L.; Chang, W.-T.; Liu, C.; Tsay, W.-I.; Li, J.-H.; Kuo, T.-L. Isotopically Labeled Analogues for Drug Quantitation. Anal. Chem.
**2002**, 74, 618A–626A. [Google Scholar] [CrossRef] [PubMed] - Redestig, H.; Fukushima, A.; Stenlund, H.; Moritz, T.; Arita, M.; Saito, K.; Kusano, M. Compensation for Systematic Cross-Contribution Improves Normalization of Mass Spectrometry Based Metabolomics Data. Anal. Chem.
**2009**, 81, 7974–7980. [Google Scholar] [CrossRef] - Gagnon-Bartsch, J.A.; Speed, T.P. Using Control Genes to Correct for Unwanted Variation in Microarray Data. Biostatistics
**2012**, 13, 539–552. [Google Scholar] [CrossRef] - Santamaria, G.; Liao, C.; Lindberg, C.; Chen, Y.; Wang, Z.; Rhee, K.; Pinto, F.; Yan, J.; Xavier, J.B. Evolution and Regulation of Microbial Secondary Metabolism. eLife
**2022**, 11, e76119. [Google Scholar] [CrossRef] - Dunn, W.B.; Broadhurst, D.; Begley, P.; Zelena, E.; Francis-Mcintyre, S.; Anderson, N.; Brown, M.; Knowles, J.D.; Halsall, A.; Haselden, J.N.; et al. Procedures for Large-Scale Metabolic Profiling of Serum and Plasma Using Gas Chromatography and Liquid Chromatography Coupled to Mass Spectrometry. Nat. Protoc.
**2011**, 6, 1060–1083. [Google Scholar] [CrossRef] [PubMed] - Sangster, T.; Major, H.; Plumb, R.; Wilson, A.J.; Wilson, I.D. A Pragmatic and Readily Implemented Quality Control Strategy for HPLC-MS and GC-MS-Based Metabonomic Analysis. Analyst
**2006**, 131, 1075–1078. [Google Scholar] [CrossRef] [PubMed] - Gika, H.G.; Theodoridis, G.A.; Wingate, J.E.; Wilson, I.D. Within-Day Reproducibility of an HPLC-MS-Based Method for Metabonomic Analysis: Application to Human Urine. J. Proteome Res.
**2007**, 6, 3291–3303. [Google Scholar] [CrossRef] [PubMed] - Broadhurst, D.; Goodacre, R.; Reinke, S.N.; Kuligowski, J.; Wilson, I.D.; Lewis, M.R.; Dunn, W.B. Guidelines and Considerations for the Use of System Suitability and Quality Control Samples in Mass Spectrometry Assays Applied in Untargeted Clinical Metabolomic Studies. Metabolomics
**2018**, 14, 72. [Google Scholar] [CrossRef] - Schiffman, C.; Petrick, L.; Perttula, K.; Yano, Y.; Carlsson, H.; Whitehead, T.; Metayer, C.; Hayes, J.; Rappaport, S.; Dudoit, S. Filtering Procedures for Untargeted Lc-Ms Metabolomics Data. BMC Bioinform.
**2019**, 20, 334. [Google Scholar] [CrossRef] [PubMed] - Begley, P.; Francis-McIntyre, S.; Dunn, W.B.; Broadhurst, D.I.; Halsall, A.; Tseng, A.; Knowles, J.; HUSERMET Consortium; Goodacre, R.; Kell, D.B. Development and Performance of a Gas Chromatography-Time-of-Flight Mass Spectrometry Analysis for Large-Scale Nontargeted Metabolomic Studies of Human Serum. Anal. Chem.
**2009**, 81, 7038–7046. [Google Scholar] [CrossRef] - Zelena, E.; Dunn, W.B.; Broadhurst, D.; Francis-McIntyre, S.; Carroll, K.M.; Begley, P.; O’Hagan, S.; Knowles, J.D.; Halsall, A.; HUSERMET Consortium; et al. Development of a Robust and Repeatable UPLC-MS Method for the Long-Term Metabolomic Study of Human Serum. Anal. Chem.
**2009**, 81, 1357–1364. [Google Scholar] [CrossRef] - Rousseeuw, P.J. Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis. J. Comput. Appl. Math.
**1987**, 20, 53–65. [Google Scholar] [CrossRef] - De Livera, A.M.; Olshansky, M.; Speed, T.P. Statistical Analysis of Metabolomics Data. In Metabolomics Tools for Natural Product Discovery; Humana Press: Totowa, NJ, USA, 2013; pp. 291–307. [Google Scholar]
- Virtanen, P.; Gommers, R.; Oliphant, T.E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J.; et al. Author Correction: SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nat. Methods
**2020**, 17, 352. [Google Scholar] [CrossRef] - Hunter, J.D. Matplotlib: A 2D Graphics Environment. Comput. Sci. Eng.
**2007**, 9, 90–95. [Google Scholar] [CrossRef] - Sokal, R.R.; Rohlf, F.J. Biometry. The Principles and Practice of Statistics in Biological Research, 3rd ed.; W. H. Freeman and Company: New York, NY, USA, 1995. [Google Scholar]
- Bewick, V.; Cheek, L.; Ball, J. Statistics Review 14: Logistic Regression. Crit. Care
**2005**, 9, 112–118. [Google Scholar] [CrossRef] [PubMed] - Broadhurst, D.I.; Kell, D.B. Statistical Strategies for Avoiding False Discoveries in Metabolomics and Related Experiments. Metabolomics
**2006**, 2, 171–196. [Google Scholar] [CrossRef] - Saccenti, E.; Hoefsloot, H.C.J.; Smilde, A.K.; Westerhuis, J.A.; Hendriks, M.M.W.B. Reflections on Univariate and Multivariate Analysis of Metabolomics Data. Metabolomics
**2014**, 10, 361–374. [Google Scholar] [CrossRef] - Worley, B.; Powers, R. Multivariate Analysis in Metabolomics. Curr. Metabolomics
**2013**, 1, 92–107. [Google Scholar] [CrossRef] [PubMed] - Şenbabaoğlu, Y.; Michailidis, G.; Li, J.Z. Critical Limitations of Consensus Clustering in Class Discovery. Sci. Rep.
**2014**, 4, 6207. [Google Scholar] [CrossRef] - John, C.R.; David, W.; Russ, D.; Goldmann, K.; Ehrenstein, M.; Pitzalis, C.; Lewis, M.; Barnes, M. M3C: Monte Carlo Reference-Based Consensus Clustering. Sci. Rep.
**2020**, 10, 1816. [Google Scholar] [CrossRef] [PubMed] - Boyle, K.E.; Monaco, H.T.; Deforet, M.; Yan, J.; Wang, Z.; Rhee, K.; Xavier, J.B. Metabolism and the Evolution of Social Behavior. Mol. Biol. Evol.
**2017**, 34, 2367–2379. [Google Scholar] [CrossRef] - van der Maaten, L.; Hinton, G. Visualizing Data Using T-SNE. J. Mach. Learn. Res.
**2008**, 219, 2579–2605. [Google Scholar] - Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res.
**2011**, 12, 2825–2830. [Google Scholar] [CrossRef] - Wilkerson, M.D.; Hayes, D.N. ConsensusClusterPlus: A Class Discovery Tool with Confidence Assessments and Item Tracking. Bioinformatics
**2010**, 26, 1572–1573. [Google Scholar] [CrossRef] - Wold, S.; Sjöström, M.; Eriksson, L. PLS-Regression: A Basic Tool of Chemometrics. Chemom. Intell. Lab. Syst.
**2001**, 58, 109–130. [Google Scholar] [CrossRef] - Trygg, J.; Wold, S. Orthogonal Projections to Latent Structures (O-PLS). J. Chemom.
**2002**, 16, 119–128. [Google Scholar] [CrossRef] - Mehmood, T.; Liland, K.H.; Snipen, L.; Sæbø, S. A Review of Variable Selection Methods in Partial Least Squares Regression. Chemom. Intell. Lab. Syst.
**2012**, 118, 62–69. [Google Scholar] [CrossRef] - Rizvi, A.; Shankar, A.; Chatterjee, A.; More, T.H.; Bose, T.; Dutta, A.; Balakrishnan, K.; Madugulla, L.; Rapole, S.; Mande, S.S.; et al. Rewiring of Metabolic Network in Mycobacterium tuberculosis during Adaptation to Different Stresses. Front. Microbiol.
**2019**, 10, 2417. [Google Scholar] [CrossRef] [PubMed] - Feng, Q.; Liu, Z.; Zhong, S.; Li, R.; Xia, H.; Jie, Z.; Wen, B.; Chen, X.; Yan, W.; Fan, Y.; et al. Integrated Metabolomics and Metagenomics Analysis of Plasma and Urine Identified Microbial Metabolites Associated with Coronary Heart Disease. Sci. Rep.
**2016**, 6, 22525. [Google Scholar] [CrossRef] [PubMed] - Ma, X.; Chi, Y.H.; Niu, M.; Zhu, Y.; Zhao, Y.L.; Chen, Z.; Wang, J.B.; Zhang, C.E.; Li, J.Y.; Wang, L.F.; et al. Metabolomics Coupled with Multivariate Data and Pathway Analysis on Potential Biomarkers in Cholestasis and Intervention Effect of Paeonia lactiflora Pall. Front. Pharmacol.
**2016**, 7, 14. [Google Scholar] [CrossRef] [PubMed] - O’Boyle, N.M.; Palmer, D.S.; Nigsch, F.; Mitchell, J.B. Simultaneous Feature Selection and Parameter Optimisation Using an Artificial Ant Colony: Case Study of Melting Point Prediction. Chem. Cent. J.
**2008**, 2, 21. [Google Scholar] [CrossRef] [PubMed] - Szymańska, E.; Saccenti, E.; Smilde, A.K.; Westerhuis, J.A. Double-Check: Validation of Diagnostic Statistics for PLS-DA Models in Metabolomics Studies. Metabolomics
**2012**, 8, 3–16. [Google Scholar] [CrossRef] - Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees; Chapman & Hall: London, UK; CRC: Boca Raton, FL, USA, 1984; ISBN 0412048418. [Google Scholar]
- Breiman, L. Random Forests. Mach. Learn.
**2001**, 45, 5–32. [Google Scholar] [CrossRef] - Breiman, L. Bagging Predictors. Mach. Learn.
**1996**, 24, 123–140. [Google Scholar] [CrossRef] - Amit, Y.; Geman, D. Shape Quantization and Recognition with Randomized Trees. Neural Comput.
**1997**, 9, 1545–1588. [Google Scholar] [CrossRef] - Devroye, L.; Lugosi, G. Consistency of Random Forests and Other Averaging Classifiers. J. Mach. Learn. Res.
**2008**, 9, 2015–2033. [Google Scholar] - Venables, W.N.; Ripley, B.D. Modern Applied Statistics with S, 4th ed.; Springer: New York, NY, USA, 2002. [Google Scholar]
- Thévenot, E.A.; Roux, A.; Xu, Y.; Ezan, E.; Junot, C. Analysis of the Human Adult Urinary Metabolome Variations with Age, Body Mass Index, and Gender by Implementing a Comprehensive Workflow for Univariate and OPLS Statistical Analyses. J. Proteome Res.
**2015**, 14, 3322–3335. [Google Scholar] [CrossRef] [PubMed] - Mevik, B.-H.; Wehrens, R. The Pls Package: Principal Component and Partial Least Squares Regression in R. J. Stat. Softw.
**2007**, 18. [Google Scholar] [CrossRef] - BiRG—Wright State University Pyopls. Available online: https://pypi.org/project/pyopls/ (accessed on 15 April 2024).
- Kuhn, M. Building Predictive Models in R Using the Caret Package. J. Stat. Softw.
**2008**, 28, 1–26. [Google Scholar] [CrossRef] - Liaw, A.; Wiener, M. Classification and Regression by RandomForest. R News
**2002**, 2, 18–22. [Google Scholar] [CrossRef] - Khatri, P.; Sirota, M.; Butte, A.J. Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges. PLoS Comput. Biol.
**2012**, 8, e1002375. [Google Scholar] [CrossRef] - Goeman, J.J.; Bühlmann, P. Analyzing Gene Expression Data in Terms of Gene Sets: Methodological Issues. Bioinformatics
**2007**, 23, 980–987. [Google Scholar] [CrossRef] [PubMed] - Yu, G.; Wang, L.G.; Han, Y.; He, Q.Y. ClusterProfiler: An R Package for Comparing Biological Themes among Gene Clusters. Omics J. Integr. Biol.
**2012**, 16, 284–287. [Google Scholar] [CrossRef] - Maleki, F.; Ovens, K.; Hogan, D.J.; Kusalik, A.J. Gene Set Analysis: Challenges, Opportunities, and Future Research. Front. Genet.
**2020**, 11, 654. [Google Scholar] [CrossRef] - Mootha, V.K.; Lindgren, C.M.; Eriksson, K.F.; Subramanian, A.; Sihag, S.; Lehar, J.; Puigserver, P.; Carlsson, E.; Ridderstråle, M.; Laurila, E.; et al. PGC-1α-Responsive Genes Involved in Oxidative Phosphorylation Are Coordinately Downregulated in Human Diabetes. Nat. Genet.
**2003**, 34, 267–273. [Google Scholar] [CrossRef] - Pang, Z.; Chong, J.; Li, S.; Xia, J. MetaboAnalystR 3.0: Toward an Optimized Workflow for Global Metabolomics. Metabolites
**2020**, 10, 186. [Google Scholar] [CrossRef] [PubMed] - Tomfohr, J.; Lu, J.; Kepler, T.B. Pathway Level Analysis of Gene Expression Using Singular Value Decomposition. BMC Bioinform.
**2005**, 6, 225. [Google Scholar] [CrossRef] [PubMed] - McLuskey, K.; Wandy, J.; Vincent, I.; van der Hooft, J.J.J.; Rogers, S.; Burgess, K.; Daly, R. Ranking Metabolite Sets by Their Activity Levels. Metabolites
**2021**, 11, 103. [Google Scholar] [CrossRef] [PubMed] - Shojaie, A.; Michailidis, G. Analysis of Gene Sets Based on the Underlying Regulatory Network. J. Comput. Biol.
**2009**, 16, 407–426. [Google Scholar] [CrossRef] [PubMed] - Hellstern, M.; Ma, J.; Yue, K.; Shojaie, A. Netgsa: Fast Computation and Interactive Visualization for Topology-Based Pathway Enrichment Analysis. PLoS Comput. Biol.
**2021**, 17, e1008979. [Google Scholar] [CrossRef] [PubMed] - Picart-Armada, S.; Fernández-Albert, F.; Vinaixa, M.; Yanes, O.; Perera-Lluna, A. FELLA: An R Package to Enrich Metabolomics Data. BMC Bioinform.
**2018**, 19, 538–546. [Google Scholar] [CrossRef] - Jacob, L.; Neuvial, P.; Dudoit, S. More Power via Graph-Structured Tests for Differential Expression of Gene Networks. Ann. Appl. Stat.
**2012**, 6, 561–600. [Google Scholar] [CrossRef] - Santamaria, G.; Ruiz-Rodríguez, P.; Renau-Mínguez, C.; Pinto, F.R.; Coscollá, M. In Silico Exploration of Mycobacterium tuberculosis Metabolic Networks Shows Host-Associated Convergent Fluxomic Phenotypes. Biomolecules
**2022**, 12, 376. [Google Scholar] [CrossRef] - Baart, G.J.; Martens, D.E. Genome-Scale Metabolic Models: Reconstruction and Analysis. In Neisseria meningiditis: Advanced Methods and Protocols; Christodoulides, M., Ed.; Humana Press: Totowa, NJ, USA, 2011; pp. 107–126. [Google Scholar]
- Bartell, J.A.; Blazier, A.S.; Yen, P.; Thøgersen, J.C.; Jelsbak, L.; Goldberg, J.B.; Papin, J.A. Reconstruction of the Metabolic Network of Pseudomonas aeruginosa to Interrogate Virulence Factor Synthesis. Nat. Commun.
**2017**, 8, 14631. [Google Scholar] [CrossRef] - Edwards, J.S.; Palsson, B.O. Systems Properties of the Haemophilus Influenzae Rd Metabolic Genbotype. Mol. Biol.
**1999**, 274, 17410–17416. [Google Scholar] - Karp, P.D.; Weaver, D.; Latendresse, M. How Accurate Is Automated Gap Filling of Metabolic Models? BMC Syst. Biol.
**2018**, 12, 73. [Google Scholar] [CrossRef] [PubMed] - Palsson, B.Ø. Systems Biology: Properties of Reconstructed Networks; Cambridge University Press: Cambridge, UK, 2006; ISBN 9780521859035. [Google Scholar]
- Varma, A.; Palsson, B.Ø. Metabolic Flux Balancing: Basic Concepts, Scientific and Practical Use. Nat. Biotechnol.
**1994**, 12, 994–998. [Google Scholar] [CrossRef] - Feist, A.M.; Palsson, B.Ø. The Biomass Objective Function. Curr. Opin. Microbiol.
**2010**, 13, 344–349. [Google Scholar] [CrossRef] [PubMed] - Schuetz, R.; Kuepfer, L.; Sauer, U. Systematic Evaluation of Objective Functions for Predicting Intracellular Fluxes in Escherichia Coli. Mol. Syst. Biol.
**2007**, 3, 119. [Google Scholar] [CrossRef] [PubMed] - Piddington, D.L.; Kashkouli, A.; Buchmeier, N.A. Growth of Mycobacterium tuberculosis in a Defined Medium Is Very Restricted by Acid pH and Mg
^{2+}Levels Mycobacterium tuberculosis Grows within the Phagocytic Vacuoles of Macrophages, Where It Encounters a Moderately Acidic and Possibly Nutrient-Restricted. Infect. Immun.**2000**, 68, 4518–4522. [Google Scholar] [CrossRef] [PubMed] - Boyle, K.E.; Monaco, H.; van Ditmarsch, D.; Deforet, M.; Xavier, J.B. Integration of Metabolic and Quorum Sensing Signals Governing the Decision to Cooperate in a Bacterial Social Trait. PLoS Comput. Biol.
**2015**, 11, e1004279. [Google Scholar] [CrossRef] [PubMed] - Herrmann, H.A.; Dyson, B.C.; Vass, L.; Johnson, G.N.; Schwartz, J.M. Flux Sampling Is a Powerful Tool to Study Metabolism under Changing Environmental Conditions. npj Syst. Biol. Appl.
**2019**, 5, 32. [Google Scholar] [CrossRef] [PubMed] - Wiback, S.J.; Famili, I.; Greenberg, H.J.; Palsson, B. Monte Carlo Sampling Can Be Used to Determine the Size and Shape of the Steady-State Flux Space. J. Theor. Biol.
**2004**, 228, 437–447. [Google Scholar] [CrossRef] - Wishart, D.S. Emerging Applications of Metabolomics in Drug Discovery and Precision Medicine. Nat. Rev. Drug Discov.
**2016**, 15, 473–484. [Google Scholar] [CrossRef] [PubMed] - Øyås, O.; Borrell, S.; Trauner, A.; Zimmermann, M.; Feldmann, J.; Liphardt, T.; Gagneux, S.; Stelling, J.; Sauer, U.; Zampieri, M. Model-Based Integration of Genomics and Metabolomics Reveals SNP Functionality in Mycobacterium tuberculosis. Proc. Natl. Acad. Sci. USA
**2020**, 117, 8494–8502. [Google Scholar] [CrossRef] - Gustafsson, J.; Anton, M.; Roshanzamir, F.; Jörnsten, R.; Kerkhiven, E.J.; Robinson, J.; Nielsen, J. Generation and Analysis of Context-Specific Genome-Scale Metabolic Models Derived from Single-Cell RNA-Seq Data. Proc. Natl. Acad. Sci. USA
**2023**, 120, e2217868120. [Google Scholar] [CrossRef]

**Figure 1.**Evaluation of tightness of replicates before and after normalization with different methods. Performed on metabolomic dataset of different Pseudomonas aeruginosa clinical isolates, obtained from Santamaria et al. [73]. (

**A**) Comparison of mean distance between replicates vs. mean distance across different clinical isolates after tested normalization methods. (

**B**) Comparison of silhouette distributions. Both plots indicate that the CCMN method is the one that produces tighter replicates.

**Figure 3.**Plots obtained from an OPLS-DA model. The OPLS-DA model was fitted on the metabolomic dataset included in the study by Boyle et al. 2017 [92], classifying the Pseudomonas aeruginosa clinical isolates as swarmers and non-swarmers (a collective motility phenotype P. aeruginosa displays). (

**A**) Score plot of the predictive component and the first orthogonal component. Swarmers are indicated in blue, and non-swarmers in red. At the bottom of the plot, the R

^{2}value of the predictive component and the Q

^{2}obtained with cross validation are indicated. (

**B**) Scatterplot of the R

^{2}and Q

^{2}values obtained with a permutation test (n = 200). Actual R

^{2}values are indicated in gray, while Q

^{2}values are indicated in black. The respective horizontal lines represent the actual R

^{2}and Q

^{2}. At the bottom are the p-values of R

^{2}and Q

^{2}, which are the proportion of permutated values that are higher or equal than the actual values. The similarity between the response variable value and the permutated response variable is represented in the x axis. (

**C**) Bar plot of the loadings for the predictive component of the metabolites that were determined as statistically significant using the OPLS-DA model, with a variable importance in the projection (VIP) higher than or equal to 1. The metabolites with a negative loading are at a higher abundance in non-swarmers, while the ones with a positive value are at a higher abundance in swarmers.

**Figure 4.**Example of pathway enrichment graph obtained using FELLA. The graph depicted here includes all the KEGG entries with a high probability of receiving a propagated signal from the differential metabolites, represented as green squares. The category of each KEGG entry is represented with a color. These results can be printed out as a table of p-values too. The red and the green square nodes represent the enriched pathways and the input differential compounds, respectively. The names of both categories are indicated in the labels. The enrichment depicted here was obtained using as input compounds the differential metabolites between a biosurfactant producer and non-producer strains from a study by Santamaria et al. [73].

**Figure 5.**Graphical 3D representations of flux distribution obtention methods. The three axes represent the values of the three variables of a simplified model: the flux for each of the three reactions. The constraints that the model determines delimit the solution space, which is represented as the blue polyhedron. The red spheres represent feasible solutions of the solution space. (

**A**) In flux balance analysis (FBA), the flux of an objective reaction is maximized; in this case, for V1, the solution is at the edge of the solution space where V1 is maximized. (

**B**) In flux sampling, the solution space is randomly sampled a determined number of times. Therefore, each red sphere is a flux distribution sampled from the solution space.

**Figure 6.**Overview of bioinformatic analysis of metabolomic data. Each one of the four modules in our workflow is indicated in a different color. The workflow starts with raw spectra. The first module is raw spectrum preprocessing, which involves four steps: denoising, peak picking, peak alignment, and compound identification. Raw spectrum preprocessing generates a raw metabolite peak area dataset, which then undergoes further preprocessing. This is performed in the second module: raw peak area preprocessing. Here the initial step is missing value imputation, where missing values (indicated in red) are inferred (indicated in green). The data are then transformed, to mitigate the right-skewed distribution typical of metabolomic data, and normalized using the most appropriate method for the dataset. Once data are preprocessed, statistical analysis can be performed, using one or several univariate or multivariate methods. In this second category, unsupervised and supervised approaches can be alternated between. The outcome of this step is a list of differential metabolites. Finally, the last module is metabolic pathway enrichment. Some approaches, like ORA and some PT methods such as FELLA, require lists of differential metabolites. In FCS and in other PT methods, the normalized metabolite abundances can be used directly. The output of the metabolic pathway enrichment is the altered metabolic pathways. Abbreviations: ORA—over-representation analysis; FCS—functional class scoring; PT—pathway topology.

Scaling Methods | Internal Standard/Nonchanging Metabolite-Based | QC Sample-Based | |
---|---|---|---|

Scaling | Statistical Modeling | ||

mean median sum | IS RI | NOMIS CCMN RUV-2 | QC-RLSC |

**Table 2.**Univariate tests classified according to the type of biological factors they can deal with.

Quantitative Factors | Categorical Factors | |||
---|---|---|---|---|

Parametric | Non-Parametric | |||

2 Classes | >2 Classes | 2 Classes | >2 Classes | |

Simple linear regression | Student’s t test Logistic regression (Wald test) | ANOVA | Mann–Whitney U test | Kruskal–Wallis test |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Santamaria, G.; Pinto, F.R.
Bioinformatic Analysis of Metabolomic Data: From Raw Spectra to Biological Insight. *BioChem* **2024**, *4*, 90-114.
https://doi.org/10.3390/biochem4020005

**AMA Style**

Santamaria G, Pinto FR.
Bioinformatic Analysis of Metabolomic Data: From Raw Spectra to Biological Insight. *BioChem*. 2024; 4(2):90-114.
https://doi.org/10.3390/biochem4020005

**Chicago/Turabian Style**

Santamaria, Guillem, and Francisco R. Pinto.
2024. "Bioinformatic Analysis of Metabolomic Data: From Raw Spectra to Biological Insight" *BioChem* 4, no. 2: 90-114.
https://doi.org/10.3390/biochem4020005