Metabolomics Data Analysis and Quality Assessment

A special issue of Metabolites (ISSN 2218-1989). This special issue belongs to the section "Bioinformatics and Data Analysis".

Deadline for manuscript submissions: closed (15 February 2022) | Viewed by 28660

Special Issue Editor


E-Mail Website
Guest Editor
Section Editor, Department of Molecular and Cellular Biochemistry, University of Kentucky, Lexington, KY 40536, USA
Interests: systems biology; translational bioinformatics; biophysical informatics
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

In comparison to other major omics technologies, metabolomics analytical technologies create unique data and data quality issues in collected datasets. These unique data and data quality issues are not well recognized both inside and definitely outside of the metabolomics field. Detecting, understanding, and handling these issues is paramount to the successful extraction of useful information from metabolomics data analyses and the subsequent successful integration of this information with other omics datasets.

Therefore, this Special Issue is devoted to illuminating the unique data and data quality issues in metabolomics datasets and their impact on downstream data analyses. Appropriate topics include methodologies, software, and tools that can (i) qualitatively or quantitatively characterize either raw or intermediate metabolomics data and data quality, (ii) mitigate data quality issues in metabolomics data analyses, and/or (iii) evaluate the quality of metabolomics data analyses for general or specific use-cases. Additionally, review articles that cover these topics are welcomed, as well as case studies that illuminate unique analytical and data quality issues in metabolomics analytical technologies.

Prof. Hunter N. B. Moseley
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Metabolites is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • metabolomics 
  • data quality issues 
  • metabolomics datasets
  • data analyses 
  • error analysis
  • qualitative analysis 
  • quantitative analysis

Published Papers (11 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review, Other

30 pages, 11274 KiB  
Article
Scan-Centric, Frequency-Based Method for Characterizing Peaks from Direct Injection Fourier Transform Mass Spectrometry Experiments
by Robert M. Flight, Joshua M. Mitchell and Hunter N. B. Moseley
Metabolites 2022, 12(6), 515; https://doi.org/10.3390/metabo12060515 - 02 Jun 2022
Viewed by 1605
Abstract
We present a novel, scan-centric method for characterizing peaks from direct injection multi-scan Fourier transform mass spectra of complex samples that utilizes frequency values derived directly from the spacing of raw m/z points in spectral scans. Our peak characterization method utilizes [...] Read more.
We present a novel, scan-centric method for characterizing peaks from direct injection multi-scan Fourier transform mass spectra of complex samples that utilizes frequency values derived directly from the spacing of raw m/z points in spectral scans. Our peak characterization method utilizes intensity-independent noise removal and normalization of scan-level data to provide a much better fit of relative intensity to natural abundance probabilities for low abundance isotopologues that are not present in all of the acquired scans. Moreover, our method calculates both peak- and scan-specific statistics incorporated within a series of quality control steps that are designed to robustly derive peak centers, intensities, and intensity ratios with their scan-level variances. These cross-scan characterized peaks are suitable for use in our previously published peak assignment methodology, Small Molecule Isotope Resolved Formula Enumeration (SMIRFE). Full article
(This article belongs to the Special Issue Metabolomics Data Analysis and Quality Assessment)
Show Figures

Figure 1

13 pages, 1803 KiB  
Article
Towards Unbiased Evaluation of Ionization Performance in LC-HRMS Metabolomics Method Development
by Carsten Jaeger and Jan Lisec
Metabolites 2022, 12(5), 426; https://doi.org/10.3390/metabo12050426 - 10 May 2022
Viewed by 1528
Abstract
As metabolomics increasingly finds its way from basic science into applied and regulatory environments, analytical demands on nontargeted mass spectrometric detection methods continue to rise. In addition to improved chemical comprehensiveness, current developments aim at enhanced robustness and repeatability to allow long-term, inter-study, [...] Read more.
As metabolomics increasingly finds its way from basic science into applied and regulatory environments, analytical demands on nontargeted mass spectrometric detection methods continue to rise. In addition to improved chemical comprehensiveness, current developments aim at enhanced robustness and repeatability to allow long-term, inter-study, and meta-analyses. Comprehensive metabolomics relies on electrospray ionization (ESI) as the most versatile ionization technique, and recent liquid chromatography-high resolution mass spectrometry (LC-HRMS) instrumentation continues to overcome technical limitations that have hindered the adoption of ESI for applications in the past. Still, developing and standardizing nontargeted ESI methods and instrumental setups remains costly in terms of time and required chemicals, as large panels of metabolite standards are needed to reflect biochemical diversity. In this paper, we investigated in how far a nontargeted pilot experiment, consisting only of a few measurements of a test sample dilution series and comprehensive statistical analysis, can replace conventional targeted evaluation procedures. To examine this potential, two instrumental ESI ion source setups were compared, reflecting a common scenario in practical method development. Two types of feature evaluations were performed, (a) summary statistics solely involving feature intensity values, and (b) analyses additionally including chemical interpretation. Results were compared in detail to a targeted evaluation of a large metabolite standard panel. We reflect on the advantages and shortcomings of both strategies in the context of current harmonization initiatives in the metabolomics field. Full article
(This article belongs to the Special Issue Metabolomics Data Analysis and Quality Assessment)
Show Figures

Figure 1

14 pages, 4225 KiB  
Article
TRACES: A Lightweight Browser for Liquid Chromatography–Multiple Reaction Monitoring–Mass Spectrometry Chromatograms
by Yoshihiro Kita, Suzumi M. Tokuoka, Yoshiya Oda and Takao Shimizu
Metabolites 2022, 12(4), 354; https://doi.org/10.3390/metabo12040354 - 15 Apr 2022
Cited by 3 | Viewed by 2121
Abstract
In targeted metabolomic analysis using liquid chromatography–multiple reaction monitoring–mass spectrometry (LC-MRM-MS), hundreds of MRMs are performed in a single run, yielding a large dataset containing thousands of chromatographic peaks. Automation tools for processing large MRM datasets have been reported, but a visual review [...] Read more.
In targeted metabolomic analysis using liquid chromatography–multiple reaction monitoring–mass spectrometry (LC-MRM-MS), hundreds of MRMs are performed in a single run, yielding a large dataset containing thousands of chromatographic peaks. Automation tools for processing large MRM datasets have been reported, but a visual review of chromatograms is still critical, as real samples with biological matrices often cause complex chromatographic patterns owing to non-specific, insufficiently separated, isomeric, and isotopic components. Herein, we report the development of new software, TRACES, a lightweight chromatogram browser for MRM-based targeted LC-MS analysis. TRACES provides rapid access to all MRM chromatograms in a dataset, allowing users to start ad hoc data browsing without preparations such as loading compound libraries. As a special function of the software, we implemented a chromatogram-level deisotoping function that facilitates the identification of regions potentially affected by isotopic signals. Using MRM libraries containing precursor and product formulae, the algorithm reveals all possible isotopic interferences in the dataset and generates deisotoped chromatograms. To validate the deisotoping function in real applications, we analyzed mouse tissue phospholipids in which isotopic interference by molecules with different fatty-acyl unsaturation levels is known. TRACES successfully removed isotopic signals within the MRM chromatograms, helping users avoid inappropriate regions for integration. Full article
(This article belongs to the Special Issue Metabolomics Data Analysis and Quality Assessment)
Show Figures

Figure 1

29 pages, 2626 KiB  
Article
Volatilomics-Based Microbiome Evaluation of Fermented Dairy by Prototypic Headspace-Gas Chromatography–High-Temperature Ion Mobility Spectrometry (HS-GC-HTIMS) and Non-Negative Matrix Factorization (NNMF)
by Charlotte C. Capitain, Fatemeh Nejati, Martin Zischka, Markus Berzak, Stefan Junne, Peter Neubauer and Philipp Weller
Metabolites 2022, 12(4), 299; https://doi.org/10.3390/metabo12040299 - 28 Mar 2022
Cited by 4 | Viewed by 2430
Abstract
Fermented foods, such as yogurt and kefir, contain a versatile spectrum of volatile organic compounds (VOCs), including ethanol, acetic acid, ethyl acetate, and diacetyl. To overcome the challenge of overlapping peaks regarding these key compounds, the drift tube temperature was raised in a [...] Read more.
Fermented foods, such as yogurt and kefir, contain a versatile spectrum of volatile organic compounds (VOCs), including ethanol, acetic acid, ethyl acetate, and diacetyl. To overcome the challenge of overlapping peaks regarding these key compounds, the drift tube temperature was raised in a prototypic high-temperature ion mobility spectrometer (HTIMS). This HS-GC-HTIMS was used for the volatilomic profiling of 33 traditional kefir, 13 commercial kefir, and 15 commercial yogurt samples. Pattern recognition techniques, including principal component analysis (PCA) and NNMF, in combination with non-targeted screening, revealed distinct differences between traditional and commercial kefir while showing strong similarities between commercial kefir and yogurt. Classification of fermented dairy samples into commercial yogurt, commercial kefir, traditional mild kefir, and traditional tangy kefir was also possible for both PCA- and NNMF-based models, obtaining cross-validation (CV) error rates of 0% for PCA-LDA, PCA-kNN (k = 5), and NNMF-kNN (k = 5) and 3.3% for PCA-SVM and NNMF-LDA. Through back projection of NNMF loadings, characteristic substances were identified, indicating a mild flavor composition of commercial samples, with high concentrations of buttery-flavored diacetyl. In contrast, traditional kefir showed a diverse VOC profile with high amounts of flavorful alcohols (including ethanol and methyl-1-butanol), esters (including ethyl acetate and 3-methylbutyl acetate), and aldehydes. For validation of the results and deeper understanding, qPCR sequencing was used to evaluate the microbial consortia, confirming the microbial associations between commercial kefir and commercial yogurt and reinforcing the differences between traditional and commercial kefir. The diverse flavor profile of traditional kefir primarily results from the yeast consortium, while commercial kefir and yogurt is primarily, but not exclusively, produced through bacterial fermentation. The flavor profile of fermented dairy products may be used to directly evaluate the microbial consortium using HS-GC-HTIMS analysis. Full article
(This article belongs to the Special Issue Metabolomics Data Analysis and Quality Assessment)
Show Figures

Figure 1

11 pages, 1237 KiB  
Article
Application of Machine Learning Solutions to Optimize Parameter Prediction to Enhance Automatic NMR Metabolite Profiling
by Daniel Cañueto, Reza M. Salek, Mònica Bulló, Xavier Correig and Nicolau Cañellas
Metabolites 2022, 12(4), 283; https://doi.org/10.3390/metabo12040283 - 24 Mar 2022
Viewed by 1626
Abstract
The quality of automatic metabolite profiling in NMR datasets from complex matrices can be affected by the numerous sources of variability. These sources, as well as the presence of multiple low-intensity signals, cause uncertainty in the metabolite signal parameters. Lineshape fitting approaches often [...] Read more.
The quality of automatic metabolite profiling in NMR datasets from complex matrices can be affected by the numerous sources of variability. These sources, as well as the presence of multiple low-intensity signals, cause uncertainty in the metabolite signal parameters. Lineshape fitting approaches often produce suboptimal resolutions to adapt them in a complex spectrum lineshape. As a result, the use of software tools for automatic profiling tends to be restricted to specific biological matrices and/or sample preparation protocols to obtain reliable results. However, the analysis and modelling of the signal parameters collected during initial iteration can be further optimized to reduce uncertainty by generating narrow and accurate predictions of the expected signal parameters. In this study, we show that, thanks to the predictions generated, better profiling quality indicators can be outputted, and the performance of automatic profiling can be maximized. Our proposed workflow can learn and model the sample properties; therefore, restrictions in the biological matrix, or sample preparation protocol, and limitations of lineshape fitting approaches can be overcome. Full article
(This article belongs to the Special Issue Metabolomics Data Analysis and Quality Assessment)
Show Figures

Figure 1

19 pages, 3218 KiB  
Article
Matching Drug Metabolites from Non-Targeted Metabolomics to Self-Reported Medication in the Qatar Biobank Study
by Karsten Suhre, Nisha Stephan, Shaza Zaghlool, Chris R. Triggle, Richard J. Robinson, Anne M. Evans and Anna Halama
Metabolites 2022, 12(3), 249; https://doi.org/10.3390/metabo12030249 - 16 Mar 2022
Cited by 6 | Viewed by 3165
Abstract
Modern metabolomics platforms are able to identify many drug-related metabolites in blood samples. Applied to population-based biobank studies, the detection of drug metabolites can then be used as a proxy for medication use or serve as a validation tool for questionnaire-based health assessments. [...] Read more.
Modern metabolomics platforms are able to identify many drug-related metabolites in blood samples. Applied to population-based biobank studies, the detection of drug metabolites can then be used as a proxy for medication use or serve as a validation tool for questionnaire-based health assessments. However, it is not clear how well detection of drug metabolites in blood samples matches information on self-reported medication provided by study participants. Here, we curate free-text responses to a drug-usage questionnaire from 6000 participants of the Qatar Biobank (QBB) using standardized WHO Anatomical Therapeutic Chemical (ATC) Classification System codes and compare the occurrence of these ATC terms to the detection of drug-related metabolites in matching blood plasma samples from 2807 QBB participants for which we collected non-targeted metabolomics data. We found that the detection of 22 drug-related metabolites significantly associated with the self-reported use of the corresponding medication. Good agreement of self-reported medication with non-targeted metabolomics was observed, with self-reported drugs and their metabolites being detected in a same blood sample in 79.4% of the cases. On the other hand, only 29.5% of detected drug metabolites matched to self-reported medication. Possible explanations for differences include under-reporting of over-the-counter medications from the study participants, such as paracetamol, misannotation of low abundance metabolites, such as metformin, and inability of the current methods to detect them. Taken together, our study provides a broad real-world view of what to expect from large non-targeted metabolomics measurements in population-based biobank studies and indicates areas where further improvements can be made. Full article
(This article belongs to the Special Issue Metabolomics Data Analysis and Quality Assessment)
Show Figures

Figure 1

13 pages, 2651 KiB  
Article
JPA: Joint Metabolic Feature Extraction Increases the Depth of Chemical Coverage for LC-MS-Based Metabolomics and Exposomics
by Jian Guo, Sam Shen, Min Liu, Chenjingyi Wang, Brian Low, Ying Chen, Yaxi Hu, Shipei Xing, Huaxu Yu, Yu Gao, Mingliang Fang and Tao Huan
Metabolites 2022, 12(3), 212; https://doi.org/10.3390/metabo12030212 - 26 Feb 2022
Cited by 6 | Viewed by 3250
Abstract
Extracting metabolic features from liquid chromatography-mass spectrometry (LC-MS) data has been a long-standing bioinformatic challenge in untargeted metabolomics. Conventional feature extraction algorithms fail to recognize features with low signal intensities, poor chromatographic peak shapes, or those that do not fit the parameter settings. [...] Read more.
Extracting metabolic features from liquid chromatography-mass spectrometry (LC-MS) data has been a long-standing bioinformatic challenge in untargeted metabolomics. Conventional feature extraction algorithms fail to recognize features with low signal intensities, poor chromatographic peak shapes, or those that do not fit the parameter settings. This problem also poses a challenge for MS-based exposome studies, as low-abundant metabolic or exposomic features cannot be automatically recognized from raw data. To address this data processing challenge, we developed an R package, JPA (short for Joint Metabolomic Data Processing and Annotation), to comprehensively extract metabolic features from raw LC-MS data. JPA performs feature extraction by combining a conventional peak picking algorithm and strategies for (1) recognizing features with bad peak shapes but that have tandem mass spectra (MS2) and (2) picking up features from a user-defined targeted list. The performance of JPA in global metabolomics was demonstrated using serial diluted urine samples, in which JPA was able to rescue an average of 25% of metabolic features that were missed by the conventional peak picking algorithm due to dilution. More importantly, the chromatographic peak shapes, analytical accuracy, and precision of the rescued metabolic features were all evaluated. Furthermore, owing to its sensitive feature extraction, JPA was able to achieve a limit of detection (LOD) that was up to thousands of folds lower when automatically processing metabolomics data of a serial diluted metabolite standard mixture analyzed in HILIC(−) and RP(+) modes. Finally, the performance of JPA in exposome research was validated using a mixture of 250 drugs and 255 pesticides at environmentally relevant levels. JPA detected an average of 2.3-fold more exposure compounds than conventional peak picking only. Full article
(This article belongs to the Special Issue Metabolomics Data Analysis and Quality Assessment)
Show Figures

Graphical abstract

13 pages, 1910 KiB  
Article
A Multi-Label Classifier for Predicting the Most Appropriate Instrumental Method for the Analysis of Contaminants of Emerging Concern
by Nikiforos Alygizakis, Vasileios Konstantakos, Grigoris Bouziotopoulos, Evangelos Kormentzas, Jaroslav Slobodnik and Nikolaos S. Thomaidis
Metabolites 2022, 12(3), 199; https://doi.org/10.3390/metabo12030199 - 23 Feb 2022
Cited by 5 | Viewed by 2133
Abstract
Liquid chromatography-high resolution mass spectrometry (LC-HRMS) and gas chromatography-high resolution mass spectrometry (GC-HRMS) have revolutionized analytical chemistry among many other disciplines. These advanced instrumentations allow to theoretically capture the whole chemical universe that is contained in samples, giving unimaginable opportunities to the scientific [...] Read more.
Liquid chromatography-high resolution mass spectrometry (LC-HRMS) and gas chromatography-high resolution mass spectrometry (GC-HRMS) have revolutionized analytical chemistry among many other disciplines. These advanced instrumentations allow to theoretically capture the whole chemical universe that is contained in samples, giving unimaginable opportunities to the scientific community. Laboratories equipped with these instruments produce a lot of data daily that can be digitally archived. Digital storage of data opens up the opportunity for retrospective suspect screening investigations for the occurrence of chemicals in the stored chromatograms. The first step of this approach involves the prediction of which data is more appropriate to be searched. In this study, we built an optimized multi-label classifier for predicting the most appropriate instrumental method (LC-HRMS or GC-HRMS or both) for the analysis of chemicals in digital specimens. The approach involved the generation of a baseline model based on the knowledge that an expert would use and the generation of an optimized machine learning model. A multi-step feature selection approach, a model selection strategy, and optimization of the classifier’s hyperparameters led to a model with accuracy that outperformed the baseline implementation. The models were used to predict the most appropriate instrumental technique for new substances. The scripts are available at GitHub and the dataset at Zenodo. Full article
(This article belongs to the Special Issue Metabolomics Data Analysis and Quality Assessment)
Show Figures

Graphical abstract

15 pages, 2983 KiB  
Article
Comprehensive Peak Characterization (CPC) in Untargeted LC–MS Analysis
by Kristian Pirttilä, David Balgoma, Johannes Rainer, Curt Pettersson, Mikael Hedeland and Carl Brunius
Metabolites 2022, 12(2), 137; https://doi.org/10.3390/metabo12020137 - 02 Feb 2022
Cited by 6 | Viewed by 3027
Abstract
LC–MS-based untargeted metabolomics is heavily dependent on algorithms for automated peak detection and data preprocessing due to the complexity and size of the raw data generated. These algorithms are generally designed to be as inclusive as possible in order to minimize the number [...] Read more.
LC–MS-based untargeted metabolomics is heavily dependent on algorithms for automated peak detection and data preprocessing due to the complexity and size of the raw data generated. These algorithms are generally designed to be as inclusive as possible in order to minimize the number of missed peaks. This is known to result in an abundance of false positive peaks that further complicate downstream data processing and analysis. As a consequence, considerable effort is spent identifying features of interest that might represent peak detection artifacts. Here, we present the CPC algorithm, which allows automated characterization of detected peaks with subsequent filtering of low quality peaks using quality criteria familiar to analytical chemists. We provide a thorough description of the methods in addition to applying the algorithms to authentic metabolomics data. In the example presented, the algorithm removed about 35% of the peaks detected by XCMS, a majority of which exhibited a low signal-to-noise ratio. The algorithm is made available as an R-package and can be fully integrated into a standard XCMS workflow. Full article
(This article belongs to the Special Issue Metabolomics Data Analysis and Quality Assessment)
Show Figures

Figure 1

Review

Jump to: Research, Other

10 pages, 305 KiB  
Review
A Review on Differential Abundance Analysis Methods for Mass Spectrometry-Based Metabolomic Data
by Zhengyan Huang and Chi Wang
Metabolites 2022, 12(4), 305; https://doi.org/10.3390/metabo12040305 - 30 Mar 2022
Cited by 3 | Viewed by 1992
Abstract
This review presents an overview of the statistical methods on differential abundance (DA) analysis for mass spectrometry (MS)-based metabolomic data. MS has been widely used for metabolomic abundance profiling in biological samples. The high-throughput data produced by MS often contain a large fraction [...] Read more.
This review presents an overview of the statistical methods on differential abundance (DA) analysis for mass spectrometry (MS)-based metabolomic data. MS has been widely used for metabolomic abundance profiling in biological samples. The high-throughput data produced by MS often contain a large fraction of zero values caused by the absence of certain metabolites and the technical detection limits of MS. Various statistical methods have been developed to characterize the zero-inflated metabolomic data and perform DA analysis, ranging from simple tests to more complex models including parametric, semi-parametric, and non-parametric approaches. In this article, we discuss and compare DA analysis methods regarding their assumptions and statistical modeling techniques. Full article
(This article belongs to the Special Issue Metabolomics Data Analysis and Quality Assessment)

Other

Jump to: Research, Review

14 pages, 1358 KiB  
Protocol
The Hitchhiker’s Guide to Untargeted Lipidomics Analysis: Practical Guidelines
by Dmitrii Smirnov, Pavel Mazin, Maria Osetrova, Elena Stekolshchikova and Ekaterina Khrameeva
Metabolites 2021, 11(11), 713; https://doi.org/10.3390/metabo11110713 - 20 Oct 2021
Cited by 6 | Viewed by 4351
Abstract
Lipidomics is a newly emerged discipline involving the identification and quantification of thousands of lipids. As a part of the omics field, lipidomics has shown rapid growth both in the number of studies and in the size of lipidome datasets, thus, requiring specific [...] Read more.
Lipidomics is a newly emerged discipline involving the identification and quantification of thousands of lipids. As a part of the omics field, lipidomics has shown rapid growth both in the number of studies and in the size of lipidome datasets, thus, requiring specific and efficient data analysis approaches. This paper aims to provide guidelines for analyzing and interpreting lipidome data obtained using untargeted methods that rely on liquid chromatography coupled with mass spectrometry (LC-MS) to detect and measure the intensities of lipid compounds. We present a state-of-the-art untargeted LC-MS workflow for lipidomics, from study design to annotation of lipid features, focusing on practical, rather than theoretical, approaches for data analysis, and we outline possible applications of untargeted lipidomics for biological studies. We provide a detailed R notebook designed specifically for untargeted lipidome LC-MS data analysis, which is based on xcms software. Full article
(This article belongs to the Special Issue Metabolomics Data Analysis and Quality Assessment)
Show Figures

Figure 1

Back to TopTop