Next Article in Journal
Increased Salt Intake Decreases Diet-Induced Thermogenesis in Healthy Volunteers: A Randomized Placebo-Controlled Study
Next Article in Special Issue
Taste of Fat and Obesity: Different Hypotheses and Our Point of View
Previous Article in Journal
Intermittent Fasting Improves High-Fat Diet-Induced Obesity Cardiomyopathy via Alleviating Lipid Deposition and Apoptosis and Decreasing m6A Methylation in the Heart
Previous Article in Special Issue
Tongue Leptin Decreases Oro-Sensory Perception of Dietary Fatty Acids
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Automated Classification of 6-n-Propylthiouracil Taster Status with Machine Learning

Department of Biomedical Sciences, University of Cagliari, Monserrato, 09042 Cagliari, Italy
*
Author to whom correspondence should be addressed.
Nutrients 2022, 14(2), 252; https://doi.org/10.3390/nu14020252
Submission received: 3 November 2021 / Revised: 31 December 2021 / Accepted: 3 January 2022 / Published: 7 January 2022
(This article belongs to the Special Issue Implications of Taste and Olfaction in Nutrition and Health)

Abstract

:
Several studies have used taste sensitivity to 6-n-propylthiouracil (PROP) to evaluate interindividual taste variability and its impact on food preferences, nutrition, and health. We used a supervised learning (SL) approach for the automatic identification of the PROP taster categories (super taster (ST); medium taster (MT); and non-taster (NT)) of 84 subjects (aged 18–40 years). Biological features determined from subjects were included for the training system. Results showed that SL enables the automatic identification of objective PROP taster status, with high precision (97%). The biological features were classified in order of importance in facilitating learning and as prediction factors. The ratings of perceived taste intensity for PROP paper disks (50 mM) and PROP solution (3.2 mM), along with fungiform papilla density, were the most important features, and high estimated values pushed toward ST prediction, while low values leaned toward NT prediction. Furthermore, TAS2R38 genotypes were significant features (AVI/AVI, PAV/PAV, and PAV/AVI to classify NTs, STs, and MTs, respectively). These results, in showing that the SL approach enables an automatic, immediate, scalable, and high-precision classification of PROP taster status, suggest that it may represent an objective and reliable tool in taste physiology studies, with applications ranging from basic science and medicine to food sciences.

1. Introduction

Taste is the sensory modality that enables organisms to distinguish nutrient-rich food from harmful substances [1,2]; it recognizes five basic sensory qualities: sweet, salty, sour, bitter, and umami. It is well known that the ability to perceive a taste varies greatly between individuals, and that the same kind of food can taste very different to different individuals [3]. Therefore, the sense of taste can significantly influence the food choices, nutrition, and health of the individual [4]. Several physiological studies have focused on the ability to taste the bitter compound 6-n-propylthiouracil (PROP), with the aim of evaluating the individual variability of taste perception in humans [4,5,6,7,8]. Individuals can be classified as belonging to one of the three PROP taster categories (PROP non-taster (NT), medium taster (MT), and super taster (ST)), based on the determination of their PROP sensitivity, by using several psychophysical methods [4,5,6,7,9]. PROP taste detection is mediated by the TAS2R38 bitter receptor, and the individual variability in the ability to perceive this stimulus is greatly associated with haplotypes of the TAS2R38 gene [10,11]. There are two common haplotypes: the PAV variant, which has high affinity for PROP, is dominant, while AVI, the recessive variant, has low or null affinity for PROP. NTs are almost always homozygous for the AVI haplotype, while STs have been assumed to be homozygous for the PAV haplotype, and MTs are heterozygous [12]. However, some studies have reported a considerable genotypic overlap between the MTs and STs [10,13], while others have shown that the presence of two PAV haplotypes (as opposed to one) confers no additional benefit for perceiving more bitterness from PROP [8]. Therefore, the haplotypic diversity in the TAS2R38 gene does not completely explain the differences in the ability to taste PROP, thus implying that other factors may be involved [4,8,11,14], such as the density and function of the fungiform papillae [15]. Several studies have consistently reported that STs have a higher density of fungiform taste papillae [6,16,17,18] or a greater functional activity [15] compared to the other PROP taster groups. These morphological features of STs can explain why these individuals are more responsive to a wide range of oral stimuli that are not mediated via the TAS2R38 bitter receptor [9,19,20,21,22,23,24,25,26,27,28,29,30,31,32]. Some authors have shown that a polymorphism in the gene codifying for gustin—the salivary taste bud trophic factor—is associated with differences in the fungiform papilla density and function, and with differences in the chemosensory ability across PROP phenotypic groups [15,33,34]. However, other studies have failed to find associations between PROP tasting and fungiform papilla density and/or gustin genotypes [35,36]. Oral sensitivity to PROP has also been related to other modifying genes [37,38], or to levels of specific amino acids and proteins in saliva [39,40,41]. Several studies have reported gender differences in PROP perception, showing that women were more sensitive to PROP than men [6,42,43], and had more taste buds and fungiform papillae [6,44]; however, other studies do not substantiate these differences [15,45,46].
In recent decades, great attention has been paid to PROP tasting as an oral biomarker of general taste perception, affecting food preferences, eating behaviors, nutritional status, and health [3,4,47]. Several studies have reported that subjects who perceive PROP as intensely bitter (PROP STs) have a higher sensitivity than NTs to various taste stimuli that are not mediated via the specific receptor, including other bitter chemicals [19,20,21,48,49,50], sweet stimuli [25], sour compounds [23], umami taste [26], etc. Several studies have shown relationships between PROP phenotype or genotype and longevity [51], age [52,53], or a number of health parameters, including antioxidant status [54], body mass index (BMI) [13,55,56,57], metabolic changes [58,59], smoking status [60,61,62], alcohol consumption [24], respiratory infections [63,64], taste disorders [65], colonic neoplasm risk [66,67], and neurodegenerative diseases [68]. However, the validity of PROP tasting as a biomarker has been questioned by some authors, who have shown inconsistent results [58,59,69,70,71,72,73,74,75,76,77,78]. The major issues that have led to these controversial results are the differential characteristics of the studied populations, of the methods used to assess health parameters, or of the psychophysical approaches used to identify the PROP phenotype which, being based on highly subjective evaluations, can produce significant measurement errors [35].
The main aim of this work was to automatically classify subjects as belonging to the three PROP taster categories by using machine learning (ML) [79] that provides real-time decision making, which could make this process immediate and scalable. This paper addresses for the first time the problem of evaluating the effectiveness of ML classifiers in the automatic discrimination between subjects belonging to the PROP taster categories (assigned to the subjects by standard non-objective scaling methods), by exploiting the biological features of the subjects. Furthermore, the proposed model was intended for understanding of the importance and the impact of each biological feature on the PROP taster status of subjects, thus also enabling the validation of a single marker for its determination.
The biological features used as predictive variables were sensory, genetic, morphological, clinical, and demographic data, which are regularly used for the classification of PROP taster status in psychophysical approaches (e.g., taste intensity ratings for PROP and NaCl [5,7]). or have been associated with PROP taster status in a large number of studies (e.g., density of fungiform papillae and genotypes of TAS2R38 and CA6 genes, age, gender, BMI, scores for taste quality sensitivity, and smoking status [3,4,6,10,11,13,15,55,56,60,61,62,80,81]).

2. Materials and Methods

2.1. Study Design

The automatic classification of subjects as belonging to the three PROP taster categories was carried out using ML [79]. ML uses algorithms and computational statistics to learn from data without explicitly computing them. Among the three approaches of ML (reinforcement learning (RL); unsupervised learning (UL); and supervised learning (SL)), we used the supervised classifier method by applying different algorithms and assembling a structured dataset consisting of the biological features of subjects that were used as inputs for the algorithms. Specifically, the SL algorithm classifiers used the labeled dataset to learn and create a classification model that evaluated the differences between subjects and returned a high-precision prediction on the PROP taster status of the subject.
The overall design of the study is depicted in Figure 1.

2.2. Experimental Procedure

For each subject, the following biological data were determined: PROP and NaCl intensity ratings, five taste scores, papilla density, BMI, and age (which were classified as numerical data), as well as TAS2R38 genotypes, gustin gene genotypes, taste sensitivity status, BMI status, smoking status, and gender (which were classified as categorical data).
Data were collected for each subject in two sessions on two successive days. For sensory analyses, subjects were requested to abstain from drinking (except water), eating, and using chewing gum or oral care products for at least 2 h prior to testing. All subjects had to be in the test room 15 min before the beginning of the session in order to adapt to the environmental conditions (23–24 °C; 40–50% relative humidity). Women were tested on the sixth/seventh day of their menstrual cycles in order to avoid taste sensitivity changes due to the estrogen phase [82].
In the first session, weight (kg) and height (m) were recorded in order to calculate the subjects’ BMI (kg/m2). Taste intensity ratings for PROP and NaCl were collected by using two validated psychophysical approaches (the three-solution test [5], and the filter paper method [7]), and samples of the whole saliva (2 mL) were collected and stored at −80 °C until the molecular analyses were completed.
In the second session, taste sensitivity to the five primary qualities (sweet, sour, salty, bitter, and umami) was examined by using the taste strip test (TST; Burghart Messtechnik, Wedel, Germany) [83,84] and umami test (Burghart Messtechnik, Wedel, Germany) [85]. The density of the fungiform papillae was also measured.
All taste solutions were previously prepared, stored in a refrigerator, and presented for the sensory measures at room temperature.

2.3. Subjects

A total of 84 Caucasian subjects (49 women and 35 men) were recruited through usual procedures at the University of Cagliari, Italy; all were originally from the island of Sardinia, Italy. Mean age was 25.07 ± 0.507 y, ranging from 18 to 40 y; 19 subjects were smokers and 65 were non-smokers. Subjects were classified as underweight (n = 10), normal weight (n = 58), or overweight (n = 16) based on their BMI: underweight subjects had a BMI below 18.5 kg/m2, normal-weight subjects from 18.5 to 24.9 kg/m2, and overweight subjects from 25.0 to 29.9 kg/m2.
Exclusion criteria were major systemic diseases, use of drugs interfering with taste or smell (e.g., steroids, antihistamines, and certain antidepressants), pregnancy or lactation, and food allergies. All subjects provided a signed informed consent form prior to being enrolled in the study. The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Ethical Committee of the University Hospital of Cagliari (protocol code: 451/09; date of approval: 5/2016).

2.4. Sensory Assessments

2.4.1. PROP and NaCl Intensity Ratings

The three-solution test, [5] and the impregnated paper screening test [7], which were used to collect PROP and NaCl intensity ratings, allowed us to classify subjects based on their PROP bitter taste status. These two methods have been used in numerous studies [33,34,40,81], and are strongly correlated with the degree of activation of peripheral taste function [86,87,88].
Subjects were first assessed using the three-solution test, as described by Tepper et al. [5]. The perceived intensity ratings of three suprathreshold solutions of sodium chloride (NaCl; 0.01, 0.1, and 1.0 mol/L) (Sigma-Aldrich, Milan, Italy) and of PROP (0.032, 0.32, and 3.2 mmol/L) (Sigma-Aldrich) were collected by using the labeled magnitude scale (LMS) [89] in each subject. The LMS allows subjects to rate the intensity of PROP bitterness relative to the “strongest imaginable” oral stimulus that they have ever experienced in their life. Concentrations (10 mL samples) of each solution type were presented in a random order. Each stimulus was followed by oral rinsing with spring water, and the interstimulus interval was set at 60 s. Subjects who gave lower ratings to PROP than to NaCl were classified as NTs, those who gave overlapping ratings to the two chemicals were classified as MTs, and those who gave higher ratings to PROP than to NaCl were classified as STs. After a 1-hour period, the belonging of each subject to a PROP taster category was validated by using the impregnated paper screening test [7]. PROP (50 mmol/L) and NaCl (1.0 mol/L) were presented via two impregnated paper disks that were applied on the tip of the tongue for 30 s and then spat out. The LMS [89] was also used in this test to rate the perceived intensity of each paper disk. Subjects who rated the PROP disk higher than 67 were classified as STs, those who rated the PROP disk lower than 15 mm were classified as NTs, and all others were categorized as MTs [7]. Four subjects classified as STs based on the three-solution test resulted as MTs when the impregnated paper screening test was used. These four subjects were excluded from the study.
Since subjects classified as STs by LMS could overestimate the oral stimuli compared to the other taster groups [4], in order to validate ST classification, these subjects were also tested by using the general labeled magnitude scale (gLMS) [90], which expands the top anchor of the scale to include any sensation. The subjects were asked to use the gLMS to rate the heaviness of 6 opaque sand-filled jars (ranging from 235 to 955 g). Heaviness ratings were used to normalize PROP taste intensity ratings as previously described [91]; these subjects still resulted as STs.
Based on their taster group assignments, 16 subjects were STs (5 M, 11 F), 51 were MTs (24 M, 27 F), and 17 (6 M, 11 F) were NTs.

2.4.2. Scores for the Five Basic Qualities and Taste Sensitivity Status Determination

The taste strip test (TST; Burghart Messtechnik, Wedel, Germany) [83,84] and the umami test (Burghart Messtechnik, Wedel, Germany) [85] were used to examine taste sensitivity to the four qualities (sweet, sour, salty, bitter) and sensitivity to umami, respectively. The two tests consist of filter paper strips impregnated with 4 concentrations of each taste quality (0.05, 0.1, 0.2, or 0.4 g/mL of sucrose; 0.05, 0.09, 0.165, or 0.3 g/mL of citric acid; 0.016, 0.04, 0.1, or 0.25 g/mL of NaCl; 0.0004, 0.0009, 0.0024, or 0.006 g/mL of quinine hydrochloride; and 0.25, 0.1, 0.04, or 0.016 g/mL of monosodium glutamate, respectively). The subject placed each paper strip on the tongue and identified the taste quality. The correct answers were rated 1; thus, the maximum score for each taste quality was 4, for the four qualities of the TST was 16, and 20 when the evaluation of umami was included (overall TST). A subject was considered normogeusic if they scored ≥ 9, and hypogeusic or ageusic if they scored < 9, as described by Landis et al. [83]. Taste qualities were presented in a pseudo-randomized manner, and concentrations were tasted from the lowest to the highest. Before each stimulation, the subjects rinsed their mouths with spring water.

2.5. Papilla Density

The fungiform papillae were identified as described by Melis et al. [15]. The tip of the anterior tongue surface was dried and stained by placing (for 5 s) a disk of filter paper (6 mm diameter) impregnated with a blue food dye (E133, Modecor Italiana, Italy) on the left side of the tongue midline. Then, 3–10 photographs of the stained area were taken in each subject using a Canon EOS D400 (10 megapixels) camera with an EFS 55–250 mm lens. The fungiform papillae were identified from the images by their mushroom shape and very light staining [92]. The number of fungiform papillae in the stained area of each image was established by the consensus of five trained observers. The density/cm2 was calculated for each subject.

2.6. Molecular Analysis

DNA was extracted from saliva samples using the QIAamp® DNA Mini Kit (QIAGEN S.r.l., Milan, Italy) according to the manufacturer’s instructions. The concentration of purified DNA was assessed by measurements at an optical density of 260 nm with an Agilent Cary 60 UV–Vis Spectrophotometer (Agilent, Palo Alto, CA, USA).
Subjects were genotyped for three single-nucleotide polymorphisms (SNPs), (rs713598, rs1726866, and rs10246939) of the TAS2R38 locus, which resulted in three amino acid substitutions (Pro49Ala, Ala262Val, and Val296Ile), and they were also genotyped for the rs2274333 (A/G) polymorphism of the gustin gene (CA6) that consists of the substitution of Ser90Gly. Molecular analyses were performed using TaqMan SNP Genotyping Assays (C_8876467_10 assay for the rs713598, C_9506827_10 assay for the rs1726866, C_9506826_10 assay for the rs10246939, and C_1739329_1_assay for the rs2274333) according to the manufacturer’s specifications (Applied Biosystems by Life Technologies Milan, Italy, Europe BV). Each reaction included two negative controls, three positive controls (one for each genotype), and two replicates.
Based on molecular analysis at the three SNPs of the TAS2R38 locus, 20 subjects were PAV homozygous, 43 were heterozygous, and 21 were AVI homozygous. Since rare haplotypes contribute to intermediate sensitivity [93], subjects with rare haplotypes were excluded from the study in order reduce confounding factors.
In addition, 49 subjects were homozygous AA for the SNP rs2274333 of the gustin gene, while 29 were heterozygous, and 6 were homozygous GG.

2.7. Machine Learning

In order to automatically classify subjects as belonging to a PROP taster category (ST, MT, or NT), we used supervised learning (SL), which involves a type of algorithm that uses labeled datasets (previously recorded from subjects) to train models in order to make predictions on new samples. In our work, the labeled dataset comprised different independent variables (features) and a corresponding dependent variable for each sample (PROP taster category). The dataset was randomly divided into 75% training data and 25% test data. During training, the algorithm searches the data for patterns that correlate with each category. After training, the SL algorithm takes new unknown inputs (in the test data) to determine to which PROP taster categories they must be assigned. We used the following algorithms: logistic regression, decision trees, random forests, k-nearest neighbors (k-NN) (version: scikit-learn 0.23.2; website: https://scikit-learn.org (accessed on 29 October 2021)), and the CatBoost classifi-er (version: catboost 0.24.3; software: https://arxiv.org/abs/1706.09516 (accessed on 29 October 2021); website: https://catboost.ai/ (accessed on 29 October 2021)). The latter is a new gradient-boosting algorithm that supports categorical features as inputs [94], and is known for its advantages in handling small datasets [94,95,96].
Learning curve graphs, which are generally used as a diagnostic tool to assess the incremental performance of a model as the controlled parameter changes, can also be employed to estimate the required dataset size. Supplementary Figure S1 shows how the performance of the SL model used in our approach initially increases with the increase in the size of the training dataset. Subsequently, at dataset size values corresponding to the ones used in this work (n = 84), the performance of the model saturates, and adding more data does not lead to a significant increase in the performance.
Metrics of evaluations—such as the accuracy (1), precision (2), recall (3), F1-score (4), receiver operating characteristic (ROC) curve, and area under the curve (AUC)—were used to evaluate the training and performance of the SL algorithms.
Accuracy = true   positives + true   negatives all   samples
Precision = true   positives   true   positives + false   positives
Recall = true   positives true   positives + false   negatives
F 1 - score = 2   ×   precision   ×   recall precision + recall
The ROC curve shows a binary or multiclassification classifier’s diagnostic ability as its discrimination threshold is varied. The ROC curve is constructed by plotting the true positive rate (TPR) in function of the false positive rate (FPR) at different threshold settings. The AUC is used in classification analysis to determine which of the used models predicts the most suitable category; a higher AUC indicates a better classifier. Values of the AUC lower than 0.5 indicate that the model has no capacity to discriminate, while a classifier is perfect when the AUC is 1. The micro-average and the macro-average are parameters to evaluate the AUC; the micro-average combines the contributions of all categories to compute the average metric, while the macro-average computes the metric independently for each category, and then makes the average (hence treating all categories equally). In a multiclass classification, micro-average is preferable when there are imbalanced categories.
The SL algorithms may cause problems of overfitting and underfitting. Overfitting occurs if the model has a high accuracy score on training data but a low score on test data, while underfitting arises if the model has a low accuracy score on both training data and test data. To reduce the possibility of overfitting, we applied cross-validation (as described below), which lowers the number of independent features by removing all non-significant and correlated features from the dataset, and by increasing regularization parameters of the SL algorithms. We did not find any problems of underfitting.

2.7.1. Data Processing Operations

Before applying SL algorithms, the following data processing operations were performed:
  • Description of data analysis: A Pearson’s (r) coefficient analysis [97] was made to verify the correlations between numerical features. Figure 2 shows that the ratings determined with the solutions in the three-solution tests, as well as those determined with impregnated paper disks, were strongly correlated with one another (r > 0.43; p < 0.001). Moreover, a strong correlation was found between papilla density and the ratings of PROP paper disks (r = 0.34; p = 0.0016). In addition, the scores of the TST and the overall TST were strongly correlated with the scores of sweet, acid, bitter, and umami (r > 0.43; p < 0.001). The p-values of the significant correlations are shown in black (Figure 2);
  • Preprocessing the data—This operation includes the handling of missing values: (1) removal of the columns/rows with more than 60% missing values; (2) estimation of the missing values (when less than 60%) with the mean or median value of the column; and (3) elimination of the duplicated values from the dataset. After analysis of the dataset, we had to eliminate four variables (sucrose threshold, and intensity ratings of three suprathreshold solutions of sucrose). In addition, in six rows in which the BMI value was lacking, it was estimated by the mean values of the column;
  • Processing the features requires transforming the content of the dataset into a language readable to be processed by an algorithm. This operation includes one-hot encoding (encoding categorical data into numerical data) and normalization of the numerical data (through transforming an actual range of numerical values into a standard range of values between 0 and 1). Finally, the synthetic minority oversampling technique (SMOTE) [98] permits the balancing of the numbers of PROP taster status categories.

2.7.2. Principal Component Analysis

Principal component analysis (PCA) is a method that is used to reduce the dimensionality of datasets, by transforming a set of variables into a smaller one while preserving as much information as possible. We used PCA for the visualization task, by transforming our training data of 34 features into a two-dimension dataset. Since the ST and NT categories have an unequal size of data with respect to the MT category, PCA was performed before and after oversampling of the dataset, including the synthetic samples generated by SMOTE.

2.7.3. Model Training

After processing of the data, we used cross-validation [99] and compared the following five models: logistic regression, gradient boosting, decision trees, random forests, and CatBoost.
Cross-Validation [99] is a technique for statistical evaluation of ML models by training them on data subsets and assessing them on the corresponding data subset. The algorithm we used to apply the cross-validation was 3-fold cross-validation, which shuffles and divides data into two groups: one group (25% of data) as the test data, and the other (75% of data) as the training data. This procedure was repeated three times, using different subsets each time. Therefore, each sample was used once in the test set and k − 1 times in the training set. In each turn, SMOTE was used to increase the amount of training data in the minority categories.

2.7.4. Hyperparameter Tuning

Hyperparameter tuning [100] indicates the automatic optimization of the hyperparameters of an SL model. This process is significant because it measures the overall behavior of an SL model. Among different hyperparameters, we used the grid search algorithm [101] which, by examining every possible combination of each set of hyperparameters, allowed us to find the best hyperparameters for our classifiers. Once the process was completed and we had obtained the best hyperparameters for each model, the models had been evaluated by means of the metrics already listed above.

2.7.5. Model Explainability and Feature Importance

Explainable SL refers to the tools and techniques that can be employed to interpret any black-box SL model. The tool that we used was Shapley Additive Explanations (SHAP) [102], which is a game-theoretical method to interpret any SL model’s output; it returns a summary plot of SHAP that links feature importance with feature effects. Each point on the summary plot is a Shapley value for a feature and an instance.

2.8. Statistical Analyses

Fisher’s method (Genepop software version 4.7.5; available on the web at the link: https://genepop.curtin.edu.au/genepop_op3.html (accessed on 29 October 2021)) [103] was used to test TAS2R38 and gustin gene genotype distribution according to PROP taster status. One-way ANOVA was used to compare differences in age, BMI, papilla density, and taste scores among STs, MTs, and NTs. Statistical analyses were conducted using STATISTICA for WINDOWS (version 7; StatSoft Inc., Tulsa, OK, USA). p-Values < 0.05 were considered significant.

3. Results

Demographic, clinical, morphological, and sensory features, along with the genotype distribution of the TAS2R38 and gustin genes determined in the overall sample and according to PROP taster status, are shown in Table 1. One-way ANOVA showed that the density of fungiform papillae varies with PROP taster status (F[2,81] = 6.802; p = 0.0019). STs had a higher papilla density than MTs (p = 0.041; Fisher’s LDS test), who showed higher density than NTs (p = 0.016; Fisher’s LDS test). The TAS2R38 SNPs were associated with PROP taster status based on genotype distribution (χ2 = 31.884; p < 0.0001; Fisher’s test). Pairwise comparisons discriminated all groups from one another (χ2 > 13.66; p < 0.0011; Fisher’s test). The genotype AVI/AVI was more frequent (88.23%) in NTs, PAV/AVI was more frequent in MTs (68.63%), and PAV/PAV was more frequent in STs (65.50%). No differences related to PROP taster status in terms of age, BMI, gender, smoking status, genotype distribution of gustin locus, or taste scores were found (p > 0.05).
Figure 3 shows two scatterplots obtained via PCA, in which a reduction in the dimensionality of the training set was performed before (A) and after oversampling data (B); the latter includes the synthetic samples generated by SMOTE. The combination of features obtained via PCA visibly shows the differences between the three PROP taster categories.
The values of the accuracy, precision, and F1-score (metrics of evaluation used to assess the training and performance of the SL algorithms) showed that the CatBoost algorithm enabled us to achieve objective PROP taster status identification with a high precision (97%), high recall (95%), and an F1-score of 96% (Table 2). Other algorithms (i.e., logistic regression, gradient boosting, decision trees, and random forests) also achieved objective PROP taster status identification, but with lower values of precision, recall, and F1-score compared to CatBoost (Table 2).
Figure 4 shows the ROC curve and AUC of the three PROP taster categories obtained via the CatBoost model. In particular, the ROC curve shows that the corrected predictions made by the model were 97%, 100%, and 99% for the ST, NT, and MT categories, respectively. In addition, the AUC of this classifier showed that the average margin of error, which was represented by the micro-average, was 98% for all categories. The macro-average was 99% for all categories, but this was not significant because we had unbalanced data in the test set. In addition, Supplementary Figure S2 shows an alluvial plot representing the changes in network structure over subject groups identified by different methods (i.e., TAS2R38 genotypes, PROP taster categories, and SL discrimination).
The CatBoost classifier allowed us to determine the order of importance of the biological features in facilitating the learning of the model to identify the three PROP taster status categories (Figure 5). Specifically, in the figure, blue indicates the importance of features for training the NT category, pink for training the ST category, and green for training the MT category. The intensity rating for PROP paper disks (50 mM) was the most important feature in the training set, followed in order of importance—from second to the seventh—by intensity rating for the PROP solution (3.2 mM), fungiform papilla density, AVI/AVI genotype, intensity rating for PROP solution (0.32 mM), PAV/AVI genotype, and PAV/PAV genotype. It is interesting to note that, among the scores given to taste qualities, those for salty and umami were the most significant in facilitating the learning of the model. Furthermore, gender was a significant feature in facilitating training, with the female gender (10th in order of importance) more significant than the male gender (16th in order of importance).
The SHAP algorithm allowed us to obtain an overview of the most important features for the model, and how they impact it to make a prediction.
The SHAP summary plot for the ST category is shown in Figure 6; the descending order of the feature importance of the ST category (from the most important at the top to the least important at the bottom) is shown on the left-hand side of the Y-axis. Conversely, the position of the value on the X-axis shows whether the feature is associated with a higher or lower prediction score for the ST category. Specifically, the SHAP summary plot for the ST category highlights that the intensity rating for PROP paper disks (50 mM) was the most important feature for the model, and high estimated values (pink) were strongly and positively correlated with the ST category. The intensity rating for the PROP solutions (3.2 mM) was the second in order of importance for the ST category, and high estimated values (pink) were positively correlated with this category. Papilla density was the third feature in order of importance for this category, and high estimated values (pink) were positively correlated with it. Low and medium values of these three features (blue and violet, respectively) pushed the model prediction towards other categories. PAV/PAV genotype was the fourth most important feature, which was positively correlated with the ST category, and was more important than the AVI/AVI genotype for the prediction of the ST category. The intensity rating for the PROP solutions (0.32 mM) was the fifth in order of importance, and high estimated values (pink) were positively correlated with the ST category, while low and medium values (blue and violet) pushed the model prediction towards other categories. Furthermore, salty and umami scores were significant predictive features, and the high and low estimated values, respectively, had a moderate impact on ST prediction. Female gender was the 13th most important feature, and was correlated moderately and positively with the ST category, while male gender was less important, and was moderately and negatively correlated with this category.
The SHAP summary plot for the MT category is shown in Figure 7. For this category, the intensity rating for PROP paper disks (50 mM) was also the most important feature for the model; however, medium estimated values (violet) were strongly and positively correlated with MT category, while high and low estimated values (pink and blue) pushed the model prediction towards other categories. It is interesting to note that PAV/AVI genotype was the second feature in order of importance, and was positively correlated with the MT category, while AVI/AVI genotype was the third feature, and was negatively correlated with this category. The intensity rating for the PROP solutions (3.2 mM) was the fourth feature, and medium estimated values (violet) were positively correlated with this category. The fifth feature in order of importance was papilla density, and high estimated values were negatively correlated with the MT category. The intensity rating for the PROP solutions (0.32 mM) was the sixth feature in order of importance, and medium estimated values (violet) were positively correlated with this category. The salty and umami scores were significant features, and medium estimated values prompted the model to make an MT prediction. Female gender was the 8th most important feature, and was negatively correlated with the MT category, while male gender was less important, and was moderately and positively correlated with this category.
The SHAP summary plot for the NT category is shown in Figure 8. In this case, the intensity ratings for PROP paper disks (50 mM) and the PROP solutions (3.2 mM) were the first and the second features in order of importance, respectively, and low estimated values (blue) were strongly and positively correlated with the NT category. AVI/AVI genotype was the third most important feature and was positively correlated with this category. The intensity rating for PROP solution (0.32 mM) was the fourth most important feature, and low estimated values (blue) were positively correlated with the NT category. Low estimated values of papilla density, which was the fifth most important feature, were positively correlated with this category, while high values were negatively correlated with it. Sour and umami scores were significant features, and medium and low estimated values, respectively, moderately pushed the model toward an NT prediction. The female gender was the 10th most important feature and was moderately and positively correlated with the NT category, while the male gender was the least significant feature, and was moderately and negatively correlated with this category.

4. Discussion

The goal of this work was to build an ML model capable of automatically predicting the PROP taster status with high precision, and to deeply understand the importance and the impact of the specific biological features of 84 subjects aged from 18 to 40 y, which were presented in the data model.
We used SL that operates a set of algorithms and computational statistics to learn from data without being explicitly computed and creates models that can make predictions on new samples. In our approach, we used different algorithms, such as logistic regression, decision trees, random forests, k-nearest neighbors (k-NN), and the CatBoost classifier. Our results showed that the CatBoost classifier was the best model for the automatic classification of the PROP taster status, as shown by the high metric values of accuracy, precision, and F1-score determined with this model. The analysis of the ROC curves and AUCs of the three PROP taster categories confirmed that the CatBoost classification model is the best model that can be used on our data; it gave the best ROC curve, showing very small error scores (3% error for the ST category predictions, 0% for predicting NT, and just 1% error for identifying MT). In addition, this model does not overfit, because it achieved approximately the same prediction scores on the training data and on the test data. The fact that CatBoost was the best model for our data is not surprising; in fact, CatBoost is an algorithm for gradient boosting on decision trees and delivers the best results when a dataset has a lot of categorical features. Moreover, it is broadly known that CatBoost can be applied across a wide range of areas and to a variety of problems [94]. Nevertheless, our results showed that the other algorithms could also achieve objective PROP taster status identification, but with a lower precision as compared to CatBoost.
Furthermore, the CatBoost classifier allowed us to obtain the order of importance of the biological features used as a dataset in facilitating the learning of the model, aimed at understanding the difference between the three PROP taster status categories. The analysis showed that the intensity rating for PROP paper disks (50 mM) was the most important feature for the model. This result is of great importance, as it indicates that the impregnated paper screening test [7], which is the simplest and fastest psychophysical approach, provides the best features in the training of the model to make predictions on the PROP taster status of new samples. This suggests the choice of this test in the planning of psychophysical experiments for the subjects’ PROP taster status classification. Our results also highlight the importance of the use of the three-solution test [5], which provided a biological feature—the intensity rating for PROP solution (3.2 mM)—which was the second feature in order of importance in the training model.
Our results also showed that the density of the fungiform papillae was a significant feature (the third in order of importance) for the training model to learn to distinguish between the PROP taster categories and was particularly effective in making the ST prediction. In fact, the SHAP algorithm showed that high estimated values of papilla density were strongly and positively correlated with the ST category. These results are consistent with data showing that STs have a higher density of fungiform taste papillae, as compared to the other PROP taster categories [6,15,16,18,78,81].
Our results also showed that the TAS2R38 locus provides significant features for training the model. However, although the molecular analysis of the locus is an objective measure, its importance in the training system was lower than that of the features obtained by psychophysical tests. This is unsurprising, since the TAS2R38 genotypes do not completely explain the oral sensory differences between MTs and STs [8], and a considerable genotypic overlap exists between these two groups [10,13,104]. These considerations suggest a more effective use of psychophysical approaches in studying taste sensitivity, rather than the determination of genotype alone for specific receptors.
According to findings showing associations between perceptions for the five taste qualities [19,20,21,23,25,26,48,49,50,88] and for PROP, our results showed that the scores that the subjects gave to tastes (that are not mediated by specific receptors) were significant features in facilitating the learning of the model to differentiate categories of PROP tasters.
Furthermore, gender was a significant feature, with the female gender—which has been shown to have a higher sensitivity to the PROP taste [6,42,43]—being more important to training the model than the male gender.
The SHAP approach allowed us to produce a high-precision explanation for the predictions that the model makes for each category of the PROP taster status. The SHAP approach confirmed the importance of the features used in the training step, and provided an explanation of how it uses each feature. Specifically, the SHAP results showed that high values of the intensity rating for PROP paper disks (50 mM) strongly pushed the model to make an ST prediction. Similarly, high estimated values for the intensity rating of PROP solution (3.2 mM), as well as those of papilla density or the PAV/PAV genotype, pushed the model towards the ST prediction. On the other hand, the SHAP approach showed that medium estimated values of the intensity rating for the PROP paper disks (50 mM) powerfully pushed the model to make an MT prediction. Accordingly, the PAV/AVI genotype had an impact favorable to the prediction of the MT category, while the AVI/AVI genotype was negatively correlated with this category. Finally, SHAP results showed that low estimated values for the intensity ratings for PROP paper disks (50 mM) and PROP solutions (3.2 mM) were directly correlated with the NT category, while the AVI/AVI genotype also pushed the model to make an NT prediction; in addition, papilla density was negatively correlated with this category. It is interesting to note that the salty and umami scores were both qualities that significantly impacted the model to make an ST or MT prediction, while the sour perception moderately pushed the model toward an NT prediction; future studies may investigate this phenomenon. In addition, according to data showing that the female gender has a higher PROP taste sensitivity than the male gender [6,42,43], the SHAP approach showed that the female gender was more important in impacting the model to make a prediction. Specifically, the female gender was positively and moderately correlated with the ST and NT categories and negatively correlated with the MT category. Conversely, the male gender was moderately and negatively correlated with ST and NT, while positively correlated with the MT category.
All of these results indicate that the classification model CatBoost used approximately the same reasoning as a biology expert to classify individuals and assign them the correct PROP taster category.
The ML methods require big data to fit the algorithms, and bias is expected to be larger for smaller datasets. However, depending on the problem domain, dataset size is not necessarily a barrier to a high-performing model, since the average performance of classifiers reached 99% on some small datasets. Althnian et al. [105] showed that the overall performance of SL classifiers depends on how much a dataset represents the original distribution rather than its size. Indeed, in addition to depending on the size of the samples, bias can also depend on other properties of the dataset, e.g., the dependency between the features and the target [94]. Based on these considerations, since we have a small sample size, we devoted more effort to selecting only the relevant features, removing outliers from the data, handling missing data, and oversampling the training set in order to balance the class distribution. Additionally, gradient-boosting decision trees (GBDTs) in general—and CatBoost in particular—are known for their advantages in handling small datasets; they perform better than the other types of models on a small dataset [94,95,96], while regulating the multiple parameters of CatBoost helps us to avoid the overfitting of the model. To reduce the bias and the variability, we conducted multiple rounds of cross-validation (K-fold, where k equals 3) with distinct subsets from the same data. The F1-score results from these multiple rounds were very close, ensuring that the model performs well on the full dataset. Therefore, in our approach, the preparation and processing of the dataset, as well as the analysis of the dataset and the definition of correlations between parameters, were fundamental steps that allowed us to scale up the dataset quality and attain better results. We found strong correlations of the rating values of PROP and NaCl stimuli with one another, as well as between fungiform papilla density and PROP rating, and the whole taste perception and perceptions of sweet, acid, bitter, and umami. The sensitivity to the bitterness of PROP varied considerably between PROP categories: NT individuals perceived PROP as low intensity, MTs as medium intensity, and STs as high intensity. We also found a strong association between the genotype of the TAS2R38 and PROP taster status, since individuals who had the AVI/AVI genotype could not be ST, while those who had the PAV/PAV genotype could not be NT. On the other hand, the two genotypes PAV/PAV and PAV/AVI could both be determined in MT and ST individuals. We also found that the three PROP taster categories were clearly distinguished after feature scaling.

5. Conclusions

In conclusion, our results show that the proposed SL approach is a reliable tool for the automatic classification of PROP taster status, through fully automatic processing, by including biological features of subjects that are normally used for the classification of subjects as belonging to the PROP taster categories in the psychophysical methods [5,7], or that have been associated with PROP taster status in physiological studies [3,4,6,10,11,15,80,81]. The proposed SL approach allowed us to achieve the high-precision automatic classification of PROP taster status of subjects, which could make this process immediate and scalable. Furthermore, this method gave us the possibility to understand which features are the most significant as predictive factors to make a precise distinction between ST, MT, and NT subjects, and identify the parametric patterns and correlations. In this way, the SL approach allowed us to identify biomarkers or combinations of biomarkers among the considered biological features, to be applied to large epidemiological studies instead of time-consuming tests.
In this study, we were able to automatically identify the PROP phenotypes of 84 subjects aged from 18 to 40 y, with high precision, and in future studies this method could be applied for the identification of PROP genotypes, thus reducing the costs and time of molecular analysis of the TAS2R38 locus. The SL model, or other types of ML (appropriate for unstructured data, such as the density of the fungiform papilla from pictures of the tongue), may be extended to physiological studies on taste, with applications ranging from basic science and medicine to food tasting evaluations.

Supplementary Materials

The following are available online at https://www.mdpi.com/article/10.3390/nu14020252/s1, Figure S1: Learning curve graph employed to estimate the performance of the SL as dataset size increases., Figure S2: Alluvial plot illustrates the changes in the composition of the clusters (TAS2R38 genotypes, PROP taster categories and SL discrimination).

Author Contributions

Conceptualization, L.C.N., I.T.B., and M.M. (Melania Melis); methodology, L.C.N., M.M. (Mariano Mastinu), and M.M. (Melania Melis); software, L.C.N.; formal analysis, L.C.N. and M.M. (Mariano Mastinu); data curation, R.C., I.T.B., and M.M. (Melania Melis); writing—original draft preparation, I.T.B.; writing—review and editing, M.M. (Melania Melis) and R.C.; supervision, I.T.B. and M.M. (Melania Melis); project administration, I.T.B.; funding acquisition, I.T.B. and M.M. (Melania Melis). All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by grants from the University of Cagliari: Fondi 5 per mille (Anno 2017) and Fondo Integrativo per la Ricerca (FIR 2019).

Institutional Review Board Statement

This study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Ethical Committee of the University Hospital of Cagliari (protocol code 451/09, date of approval 5/2016).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available, in accordance with consent provided by participants on the use of confidential data.

Acknowledgments

The authors thank the volunteers without whose contribution this study would not have been possible. We also thank Alessandro Crnjar and Ilyas Chaoua for supervising the SL method.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Scott, K. Taste recognition: Food for thought. Neuron 2005, 48, 455–464. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  2. Chaudhari, N.; Roper, S.D. The cell biology of taste. J. Cell Biol. 2010, 190, 285–296. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  3. Tepper, B.J.; Banni, S.; Melis, M.; Crnjar, R.; Tomassini Barbarossa, I. Genetic sensitivity to the bitter taste of 6-n-propylthiouracil (PROP) and its association with physiological mechanisms controlling body mass index (BMI). Nutrients 2014, 6, 3363–3381. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Tepper, B.J. Nutritional implications of genetic taste variation: The role of PROP sensitivity and other taste phenotypes. Annu. Rev. Nutr. 2008, 28, 367–388. [Google Scholar] [CrossRef]
  5. Tepper, B.J.; Christensen, C.M.; Cao, J. Development of brief methods to classify individuals by PROP taster status. Physiol. Behav. 2001, 73, 571–577. [Google Scholar] [CrossRef]
  6. Bartoshuk, L.M.; Duffy, V.B.; Miller, I.J. PTC/PROP tasting: Anatomy, psychophysics, and sex effects. Physiol. Behav. 1994, 56, 1165–1171. [Google Scholar] [CrossRef]
  7. Zhao, L.; Kirkmeyer, S.V.; Tepper, B.J. A paper screening test to assess genetic taste sensitivity to 6-n-propylthiouracil. Physiol. Behav. 2003, 78, 625–633. [Google Scholar] [CrossRef]
  8. Hayes, J.E.; Bartoshuk, L.M.; Kidd, J.R.; Duffy, V.B. Supertasting and PROP bitterness depends on more than the TAS2R38 gene. Chem. Senses 2008, 33, 255–265. [Google Scholar] [CrossRef] [Green Version]
  9. Tepper, B.J.; Nurse, R.J. PROP taster status is related to fat perception and preference. Ann. N. Y. Acad. Sci. 1998, 855, 802–804. [Google Scholar] [CrossRef]
  10. Bufe, B.; Breslin, P.A.; Kuhn, C.; Reed, D.R.; Tharp, C.D.; Slack, J.P.; Kim, U.K.; Drayna, D.; Meyerhof, W. The molecular basis of individual differences in phenylthiocarbamide and propylthiouracil bitterness perception. Curr. Biol. 2005, 15, 322–327. [Google Scholar] [CrossRef] [Green Version]
  11. Kim, U.K.; Jorgenson, E.; Coon, H.; Leppert, M.; Risch, N.; Drayna, D. Positional cloning of the human quantitative trait locus underlying taste sensitivity to phenylthiocarbamide. Science 2003, 299, 1221–1225. [Google Scholar] [CrossRef]
  12. Wooding, S.; Kim, U.K.; Bamshad, M.J.; Larsen, J.; Jorde, L.B.; Drayna, D. Natural Selection and Molecular Evolution in PTC, a Bitter-Taste Receptor Gene. Am. J. Hum. Genet. 2004, 74, 637–646. [Google Scholar] [CrossRef] [Green Version]
  13. Tepper, B.J.; Koelliker, Y.; Zhao, L.; Ullrich, N.V.; Lanzara, C.; d’Adamo, P.; Ferrara, A.; Ulivi, S.; Esposito, L.; Gasparini, P. Variation in the bitter-taste receptor gene TAS2R38, and adiposity in a genetically isolated population in Southern Italy. Obesity 2008, 16, 2289–2295. [Google Scholar] [CrossRef]
  14. Prodi, D.A.; Drayna, D.; Forabosco, P.; Palmas, M.A.; Maestrale, G.B.; Piras, D.; Pirastu, M.; Angius, A. Bitter taste study in a Sardinian genetic isolate supports the association of phenylthiocarbamide sensitivity to the TAS2R38 bitter receptor gene. Chem. Senses 2004, 29, 697–702. [Google Scholar] [CrossRef] [Green Version]
  15. Melis, M.; Atzori, E.; Cabras, S.; Zonza, A.; Calò, C.; Muroni, P.; Nieddu, M.; Padiglia, A.; Sogos, V.; Tepper, B.J.; et al. The gustin (CA6) gene polymorphism, rs2274333 (A/G), as a mechanistic link between PROP tasting and fungiform taste papilla density and maintenance. PLoS ONE 2013, 8, e74151. [Google Scholar] [CrossRef] [Green Version]
  16. Tepper, B.J.; Nurse, R.J. Fat perception is related to PROP taster status. Physiol. Behav. 1997, 61, 949–954. [Google Scholar] [CrossRef]
  17. Yackinous, C.; Guinard, J.X. Relation between PROP taster status and fat perception, touch, and olfaction. Physiol. Behav. 2001, 72, 427–437. [Google Scholar]
  18. Essick, G.; Chopra, A.; Guest, S.; McGlone, F. Lingual tactile acuity, taste perception, and the density and diameter of fungiform papillae in female subjects. Physiol. Behav. 2003, 80, 289–302. [Google Scholar] [CrossRef]
  19. Bartoshuk, L.M.; Rifkin, B.; Marks, L.E.; Hooper, J.E. Bitterness of KCl and benzoate: Related to genetic status for sensitivity to PTC/PROP. Chem. Senses 1988, 13, 517–528. [Google Scholar] [CrossRef]
  20. Gent, J.; Bartoshuk, L. Sweetness of sucrose, neohesperidin dihydrochalcone, and saccharin is related to genetic ability to taste the bitter substance 6-n-propylthiouracil. Chem. Senses 1983, 7, 265–272. [Google Scholar] [CrossRef]
  21. Bartoshuk, L.M. Bitter taste of saccharin related to the genetic ability to taste the bitter substance 6-n-propylthiouracil. Science 1979, 205, 934–935. [Google Scholar] [CrossRef]
  22. Prescott, J.; Swain-Campbell, N. Responses to repeated oral irritation by capsaicin, cinnamaldehyde and ethanol in PROP tasters and non-tasters. Chem. Senses 2000, 25, 239–246. [Google Scholar] [CrossRef]
  23. Prescott, J.; Soo, J.; Campbell, H.; Roberts, C. Responses of PROP taster groups to variations in sensory qualities within foods and beverages. Physiol. Behav. 2004, 82, 459–469. [Google Scholar] [CrossRef]
  24. Duffy, V.B.; Peterson, J.M.; Bartoshuk, L.M. Associations between taste genetics, oral sensation and alcohol intake. Physiol. Behav. 2004, 82, 435–445. [Google Scholar] [CrossRef]
  25. Yeomans, M.R.; Tepper, B.J.; Rietzschel, J.; Prescott, J. Human hedonic responses to sweetness: Role of taste genetics and anatomy. Physiol. Behav. 2007, 91, 264–273. [Google Scholar] [CrossRef]
  26. Melis, M.; Tomassini Barbarossa, I. Taste Perception of Sweet, Sour, Salty, Bitter, and Umami and Changes Due to l-Arginine Supplementation, as a Function of Genetic Ability to Taste 6-n-Propylthiouracil. Nutrients 2017, 9, 541–558. [Google Scholar] [CrossRef] [Green Version]
  27. Melis, M.; Yousaf, N.Y.; Mattes, M.Z.; Cabras, T.; Messana, I.; Crnjar, R.; Tomassini Barbarossa, I.; Tepper, B.J. Sensory perception of and salivary protein response to astringency as a function of the 6-n-propylthioural (PROP) bitter-taste phenotype. Physiol. Behav. 2017, 173, 163–173. [Google Scholar] [CrossRef]
  28. Melis, M.; Sollai, G.; Muroni, P.; Crnjar, R.; Barbarossa, I.T. Associations between orosensory perception of oleic acid, the common single nucleotide polymorphisms (rs1761667 and rs1527483) in the CD36 gene, and 6-n-propylthiouracil (PROP) tasting. Nutrients 2015, 7, 2068–2084. [Google Scholar] [CrossRef] [Green Version]
  29. Kirkmeyer, S.V.; Tepper, B.J. Understanding creaminess perception of dairy products using free-choice profiling and genetic responsivity to 6-n-propylthiouracil. Chem. Senses 2003, 28, 527–536. [Google Scholar] [CrossRef] [Green Version]
  30. Keller, K.L.; Steinmann, L.; Nurse, R.J.; Tepper, B.J. Genetic taste sensitivity to 6-n-propylthiouracil influences food preference and reported intake in preschool children. Appetite 2002, 38, 3–12. [Google Scholar] [CrossRef]
  31. Bell, K.I.; Tepper, B.J. Short-term vegetable intake by young children classified by 6-n-propylthoiuracil bitter-taste phenotype. Am. J. Clin. Nutr. 2006, 84, 245–251. [Google Scholar] [CrossRef] [PubMed]
  32. Dinehart, M.E.; Hayes, J.E.; Bartoshuk, L.M.; Lanier, S.L.; Duffy, V.B. Bitter taste markers explain variability in vegetable sweetness, bitterness, and intake. Physiol. Behav. 2006, 87, 304–313. [Google Scholar] [CrossRef] [PubMed]
  33. Calò, C.; Padiglia, A.; Zonza, A.; Corrias, L.; Contu, P.; Tepper, B.J.; Barbarossa, I.T. Polymorphisms in TAS2R38 and the taste bud trophic factor, gustin gene co-operate in modulating PROP taste phenotype. Physiol. Behav. 2011, 104, 1065–1071. [Google Scholar] [CrossRef] [PubMed]
  34. Padiglia, A.; Zonza, A.; Atzori, E.; Chillotti, C.; Calò, C.; Tepper, B.J.; Barbarossa, I.T. Sensitivity to 6-n-propylthiouracil is associated with gustin (carbonic anhydrase VI) gene polymorphism, salivary zinc, and body mass index in humans. Am. J. Clin. Nutr. 2010, 92, 539–545. [Google Scholar] [CrossRef] [Green Version]
  35. Genick, U.K.; Kutalik, Z.; Ledda, M.; Destito, M.C.S.; Souza, M.M.; Cirillo, C.A.; Godinot, N.; Martin, N.; Morya, E.; Sameshima, K.; et al. Sensitivity of Genome-Wide-Association Signals to Phenotyping Strategy: The PROP-TAS2R38 Taste Association as a Benchmark. PLoS ONE 2011, 6, e27745. [Google Scholar] [CrossRef] [Green Version]
  36. Feeney, E.L.; Hayes, J.E. Exploring associations between taste perception, oral anatomy and polymorphisms in the carbonic anhydrase (gustin) gene CA6. Physiol. Behav. 2014, 128, 148–154. [Google Scholar] [CrossRef] [Green Version]
  37. Drayna, D.; Coon, H.; Kim, U.K.; Elsner, T.; Cromer, K.; Otterud, B.; Baird, L.; Peiffer, A.P.; Leppert, M. Genetic analysis of a complex trait in the Utah Genetic Reference Project: A major locus for PTC taste ability on chromosome 7q and a secondary locus on chromosome 16p. Hum. Genet. 2003, 112, 567–572. [Google Scholar] [CrossRef]
  38. Reed, D.R.; Nanthakumar, E.; North, M.; Bell, C.; Bartoshuk, L.M.; Price, R.A. Localization of a gene for bitter-taste perception to human chromosome 5p15. Am. J. Hum. Genet. 1999, 64, 1478–1480. [Google Scholar] [CrossRef] [Green Version]
  39. Cabras, T.; Melis, M.; Castagnola, M.; Padiglia, A.; Tepper, B.J.; Messana, I.; Tomassini Barbarossa, I. Responsiveness to 6-n-propylthiouracil (PROP) is associated with salivary levels of two specific basic proline-rich proteins in humans. PLoS ONE 2012, 7, e30962. [Google Scholar] [CrossRef] [Green Version]
  40. Melis, M.; Aragoni, M.C.; Arca, M.; Cabras, T.; Caltagirone, C.; Castagnola, M.; Crnjar, R.; Messana, I.; Tepper, B.J.; Barbarossa, I.T. Marked increase in PROP taste responsiveness following oral supplementation with selected salivary proteins or their related free amino acids. PLoS ONE 2013, 8, e59810. [Google Scholar] [CrossRef] [Green Version]
  41. Melis, M.; Arca, M.; Aragoni, M.C.; Cabras, T.; Caltagirone, C.; Castagnola, M.; Crnjar, R.; Messana, I.; Tepper, B.J.; Tomassini Barbarossa, I. Dose-Dependent Effects of L-Arginine on PROP Bitterness Intensity and Latency and Characteristics of the Chemical Interaction between PROP and L-Arginine. PLoS ONE 2015, 10, e0131104. [Google Scholar] [CrossRef] [Green Version]
  42. Goldstein, G.L.; Daun, H.; Tepper, B.J. Influence of PROP taster status and maternal variables on energy intake and body weight of pre-adolescents. Physiol. Behav. 2007, 90, 809–817. [Google Scholar] [CrossRef]
  43. Whissell-Buechy, D.; Wills, C. Male and female correlations for taster (P.T.C.) phenotypes and rate of adolescent development. Ann. Hum. Biol. 1989, 16, 131–146. [Google Scholar] [CrossRef]
  44. Prutkin, J.; Fisher, E.M.; Etter, L.; Fast, K.; Gardner, E.; Lucchina, L.A.; Snyder, D.J.; Tie, K.; Weiffenbach, J.; Bartoshuk, L.M. Genetic variation and inferences about perceived taste intensity in mice and men. Physiol. Behav. 2000, 69, 161–173. [Google Scholar] [CrossRef]
  45. Zuniga, J.R.; Davis, S.H.; Englehardt, R.A.; Miller, I.J.; Schiffrman, S.S.; Phillips, C. Taste performance on the anterior human tongue varles with fungiform taste bud density. Chem. Senses 1993, 18, 449–460. [Google Scholar] [CrossRef]
  46. Correa, M.; Hutchinson, I.; Laing, D.G.; Jinks, A.L. Changes in Fungiform Papillae Density During Development in Humans. Chem. Senses 2013, 38, 519–527. [Google Scholar] [CrossRef] [Green Version]
  47. Tepper, B.J. 6-n-Propylthiouracil: A genetic marker for taste, with implications for food preference and dietary habits. Am. J. Hum. Genet. 1998, 63, 1271–1276. [Google Scholar] [CrossRef] [Green Version]
  48. Bartoshuk, L.M. The biological basis of food perception and acceptance. Food Qual. Prefer. 1993, 4, 21–32. [Google Scholar] [CrossRef]
  49. Bartoshuk, L.; Fast, K.; Karrer, T.; Marino, S.; Price, R.; Reed, D. PROP supertasters and the perception of sweetness and bitterness. Chem. Senses 1992, 17, 594. [Google Scholar]
  50. Bartoshuk, L.M.; Rifkin, B.; Marks, L.E.; Bars, P. Taste and aging. J. Gerontol. 1986, 41, 51–57. [Google Scholar] [CrossRef]
  51. Melis, M.; Errigo, A.; Crnjar, R.; Pes, G.M.; Tomassini Barbarossa, I. TAS2R38 bitter taste receptor and attainment of exceptional longevity. Sci. Rep. 2019, 9, 18047. [Google Scholar] [CrossRef] [Green Version]
  52. Whissell-Buechy, D. Effects of age and sex on taste sensitivity to phenylthiocarbamide (PTC) in the Berkeley Guidance sample. Chem. Senses 1990, 15, 39–57. [Google Scholar] [CrossRef]
  53. Mennella, J.; Pepino, M.Y.; Duke, F.; Reed, D. Age modifies the genotype-phenotype relationship for the bitter receptor TAS2R38. BMC Genet. 2010, 11, 60. [Google Scholar] [CrossRef] [Green Version]
  54. Tepper, B.J.; Williams, T.Z.; Burgess, J.R.; Antalis, C.J.; Mattes, R.D. Genetic variation in bitter taste and plasma markers of anti-oxidant status in college women. Int. J. Food Sci. Nutr. 2009, 60 (Suppl. 2), 35–45. [Google Scholar] [CrossRef]
  55. Tepper, B.J.; Neilland, M.; Ullrich, N.V.; Koelliker, Y.; Belzer, L.M. Greater energy intake from a buffet meal in lean, young women is associated with the 6-n-propylthiouracil (PROP) non-taster phenotype. Appetite 2011, 56, 104–110. [Google Scholar] [CrossRef]
  56. Tepper, B.J.; Ullrich, N.V. Influence of genetic taste sensitivity to 6-n-propylthiouracil (PROP), dietary restraint and disinhibition on body mass index in middle-aged women. Physiol. Behav. 2002, 75, 305–312. [Google Scholar] [CrossRef]
  57. Lumeng, J.C.; Cardinal, T.M.; Sitto, J.R.; Kannan, S. Ability to Taste 6-n-propylthiouracil and Body Mass Index in Low-Income Preschool-Aged Children. Obesity (Silver Spring) 2008, 16, 1522–1528. [Google Scholar] [CrossRef] [Green Version]
  58. Carta, G.; Melis, M.; Pintus, S.; Pintus, P.; Piras, C.A.; Muredda, L.; Demurtas, D.; Di Marzo, V.; Banni, S.; Barbarossa, I.T. Participants with Normal Weight or with Obesity Show Different Relationships of 6-n-Propylthiouracil (PROP) Taster Status with BMI and Plasma Endocannabinoids. Sci. Rep. 2017, 7, 1361. [Google Scholar] [CrossRef] [Green Version]
  59. Tomassini Barbarossa, I.; Carta, G.; Murru, E.; Melis, M.; Zonza, A.; Vacca, C.; Muroni, P.; Di Marzo, V.; Banni, S. Taste sensitivity to 6-n-propylthiouracil is associated with endocannabinoid plasma levels in normal-weight individuals. Nutrition 2013, 29, 531–536. [Google Scholar] [CrossRef]
  60. Enoch, M.A.; Harris, C.R.; Goldman, D. Does a reduced sensitivity to bitter taste increase the risk of becoming nicotine addicted? Addict. Behav. 2001, 26, 399–404. [Google Scholar] [CrossRef]
  61. Mangold, J.E.; Payne, T.J.; Ma, J.Z.; Chen, G.; Li, M.D. Bitter taste receptor gene polymorphisms are an important factor in the development of nicotine dependence in African Americans. J. Med. Genet. 2008, 45, 578–582. [Google Scholar] [CrossRef]
  62. Risso, D.S.; Kozlitina, J.; Sainz, E.; Gutierrez, J.; Wooding, S.; Getachew, B.; Luiselli, D.; Berg, C.J.; Drayna, D. Genetic Variation in the TAS2R38 Bitter Taste Receptor and Smoking Behaviors. PLoS ONE 2016, 11, e0164157. [Google Scholar] [CrossRef] [PubMed]
  63. Lee, R.J.; Cohen, N.A. Role of the bitter taste receptor T2R38 in upper respiratory infection and chronic rhinosinusitis. Curr. Opin. Allergy Clin. Immunol. 2015, 15, 14–20. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  64. Lee, R.J.; Xiong, G.; Kofonow, J.M.; Chen, B.; Lysenko, A.; Jiang, P.; Abraham, V.; Doghramji, L.; Adappa, N.D.; Palmer, J.N.; et al. T2R38 taste receptor polymorphisms underlie susceptibility to upper respiratory infection. J. Clin. Investig. 2012, 122, 4145–4159. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  65. Melis, M.; Grzeschuchna, L.; Sollai, G.; Hummel, T.; Tomassini Barbarossa, I. Taste disorders are partly genetically determined: Role of the TAS2R38 gene, a pilot study. Laryngoscope 2019, 129, E307–E312. [Google Scholar] [CrossRef]
  66. Basson, M.D.; Bartoshuk, L.M.; Dichello, S.Z.; Panzini, L.; Weiffenbach, J.M.; Duffy, V.B. Association between 6-n-propylthiouracil (PROP) bitterness and colonic neoplasms. Dig. Dis. Sci. 2005, 50, 483–489. [Google Scholar] [CrossRef]
  67. Carrai, M.; Steinke, V.; Vodicka, P.; Pardini, B.; Rahner, N.; Holinski-Feder, E.; Morak, M.; Schackert, H.K.; Gorgens, H.; Stemmler, S.; et al. Association between TAS2R38 gene polymorphisms and colorectal cancer risk: A case-control study in two independent populations of Caucasian origin. PLoS ONE 2011, 6, e20464. [Google Scholar] [CrossRef]
  68. Cossu, G.; Melis, M.; Sarchioto, M.; Melis, M.; Melis, M.; Morelli, M.; Tomassini Barbarossa, I. 6-n-propylthiouracil taste disruption and TAS2R38 nontasting form in Parkinson’s disease. Mov. Disord. 2018, 33, 1331–1339. [Google Scholar] [CrossRef]
  69. Gorovic, N.; Afzal, S.; Tjonneland, A.; Overvad, K.; Vogel, U.; Albrechtsen, C.; Poulsen, H.E. Genetic variation in the hTAS2R38 taste receptor and brassica vegetable intake. Scand. J. Clin. Lab. Investig. 2011, 71, 274–279. [Google Scholar] [CrossRef]
  70. Feeney, E.; O’Brien, S.; Scannell, A.; Markey, A.; Gibney, E.R. Genetic variation in taste perception: Does it have a role in healthy eating? Proc. Nutr. Soc. 2011, 70, 135–143. [Google Scholar] [CrossRef] [Green Version]
  71. Baranowski, T.; Baranowski, J.C.; Watson, K.B.; Jago, R.; Islam, N.; Beltran, A.; Martin, S.J.; Nguyen, N.; Tepper, B.J. 6-n-propylthiouracil taster status not related to reported cruciferous vegetable intake among ethnically diverse children. Nutr. Res. 2011, 31, 594–600. [Google Scholar] [CrossRef] [Green Version]
  72. Drewnowski, A.; Henderson, S.A.; Cockroft, J.E. Genetic Sensitivity to 6-n-Propylthiouracil Has No Influence on Dietary Patterns, Body Mass Indexes, or Plasma Lipid Profiles of Women. J. Am. Diet. Assoc. 2007, 107, 1340–1348. [Google Scholar] [CrossRef]
  73. Duffy, V.B.; Bartoshuk, L.M. Food acceptance and genetic variation in taste. J. Am. Diet Assoc. 2000, 100, 647–655. [Google Scholar] [CrossRef]
  74. Mennella, J.A.; Pepino, M.Y.; Reed, D.R. Genetic and environmental determinants of bitter perception and sweet preferences. Pediatrics 2005, 115, e216–e222. [Google Scholar] [CrossRef] [Green Version]
  75. O’Brien, S.A.; Feeney, E.L.; Scannell, A.G.; Markey, A.; Gibney, E.R. Bitter taste perception and dietary intake patterns in irish children. J. Nutrigenet. Nutr. 2013, 6, 43–58. [Google Scholar] [CrossRef]
  76. Kaminski, L.C.; Henderson, S.A.; Drewnowski, A. Young women’s food preferences and taste responsiveness to 6-n-propylthiouracil (PROP). Physiol. Behav. 2000, 68, 691–697. [Google Scholar] [CrossRef]
  77. Timpson, N.J.; Christensen, M.; Lawlor, D.A.; Gaunt, T.R.; Day, I.N.; Ebrahim, S.; Davey Smith, G. TAS2R38 (phenylthiocarbamide) haplotypes, coronary heart disease traits, and eating behavior in the British Women’s Heart and Health Study. Am. J. Clin. Nutr. 2005, 81, 1005–1011. [Google Scholar] [CrossRef] [Green Version]
  78. Yackinous, C.A.; Guinard, J.X. Relation between PROP (6-n-propylthiouracil) taster status, taste anatomy and dietary intake measures for young men and women. Appetite 2002, 38, 201–209. [Google Scholar] [CrossRef]
  79. Zitnik, M.; Nguyen, F.; Wang, B.; Leskovec, J.; Goldenberg, A.; Hoffman, M.M. Machine learning for integrating data in biology and medicine: Principles, practice, and opportunities. Inf. Fusion. 2019, 50, 71–91. [Google Scholar] [CrossRef]
  80. Tepper, B.J.; Melis, M.; Koelliker, Y.; Gasparini, P.; Ahijevych, K.L.; Tomassini Barbarossa, I. Factors Influencing the Phenotypic Characterization of the Oral Marker, PROP. Nutrients 2017, 9, 1275. [Google Scholar] [CrossRef] [Green Version]
  81. Barbarossa, I.T.; Melis, M.; Mattes, M.Z.; Calò, C.; Muroni, P.; Crnjar, R.; Tepper, B.J. The gustin (CA6) gene polymorphism, rs2274333 (A/G), is associated with fungiform papilla density, whereas PROP bitterness is mostly due to TAS2R38 in an ethnically-mixed population. Physiol. Behav. 2015, 138, 6–12. [Google Scholar] [CrossRef]
  82. Glanville, E.V.; Kaplan, A.R. Taste Perception and the Menstrual Cycle. Nature 1965, 205, 930–931. [Google Scholar] [CrossRef]
  83. Landis, B.N.; Welge-Luessen, A.; Bramerson, A.; Bende, M.; Mueller, C.A.; Nordin, S.; Hummel, T. “Taste Strips”—A rapid, lateralized, gustatory bedside identification test based on impregnated filter papers. J. Neurol. 2009, 256, 242–248. [Google Scholar] [CrossRef] [Green Version]
  84. Mueller, C.; Kallert, S.; Renner, B.; Stiassny, K.; Temmel, A.F.; Hummel, T.; Kobal, G. Quantitative assessment of gustatory function in a clinical context using impregnated “taste strips”. Rhinology 2003, 41, 2–6. [Google Scholar]
  85. Mueller, C.A.; Pintscher, K.; Renner, B. Clinical Test of Gustatory Function Including Umami Taste. Ann. Otol. Rhinol. Laryngol. 2011, 120, 358–362. [Google Scholar] [CrossRef]
  86. Sollai, G.; Melis, M.; Mastinu, M.; Pani, D.; Cosseddu, P.; Bonfiglio, A.; Crnjar, R.; Tepper, B.J.; Tomassini Barbarossa, I. Human Tongue Electrophysiological Response to Oleic Acid and Its Associations with PROP Taster Status and the CD36 Polymorphism (rs1761667). Nutrients 2019, 11, 315. [Google Scholar] [CrossRef] [Green Version]
  87. Sollai, G.; Melis, M.; Pani, D.; Cosseddu, P.; Usai, I.; Crnjar, R.; Bonfiglio, A.; Tomassini Barbarossa, I. First objective evaluation of taste sensitivity to 6-n-propylthiouracil (PROP), a paradigm gustatory stimulus in humans. Sci. Rep. 2017, 7, 40353. [Google Scholar] [CrossRef] [Green Version]
  88. Melis, M.; Sollai, G.; Mastinu, M.; Pani, D.; Cosseddu, P.; Bonfiglio, A.; Crnjar, R.; Tepper, B.J.; Barbarossa, I.T. Electrophysiological Responses from the Human Tongue to the Six Taste Qualities and Their Relationships with PROP Taster Status. Nutrients 2020, 12, 2017. [Google Scholar] [CrossRef]
  89. Green, B.G.; Shaffer, G.S.; Gilmore, M.M. Derivation and evaluation of a semantic scale of oral sensation magnitude with apparent ratio properties. Chem. Senses 1993, 18, 683–702. [Google Scholar] [CrossRef]
  90. Bartoshuk, L.M.; Duffy, V.B.; Green, B.G.; Hoffman, H.J.; Ko, C.W.; Lucchina, L.A.; Marks, L.E.; Snyder, D.J.; Weiffenbach, J.M. Valid across-group comparisons with labeled scales: The gLMS versus magnitude matching. Physiol. Behav. 2004, 82, 109–114. [Google Scholar] [CrossRef]
  91. Lipchock, S.V.; Mennella, J.A.; Spielman, A.I.; Reed, D.R. Human bitter perception correlates with bitter receptor messenger RNA expression in taste cells. Am. J. Clin. Nutr. 2013, 98, 1136–1143. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  92. Miller, I.J.; Reedy, F.E. Variations in human taste bud density and taste intensity perception. Physiol. Behav. 1990, 47, 1213–1219. [Google Scholar] [CrossRef]
  93. Boxer, E.E.; Garneau, N.L. Rare haplotypes of the gene TAS2R38 confer bitter taste sensitivity in humans. SpringerPlus 2015, 4, 505. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  94. Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montréal, QC, Canada, 3 December 2018; pp. 6639–6649. [Google Scholar]
  95. Jiang, J.; Wang, R.; Wang, M.; Gao, K.; Nguyen, D.D.; Wei, G.-W. Boosting Tree-Assisted Multitask Deep Learning for Small Scientific Datasets. J. Chem. Inf. Modeling 2020, 60, 1235–1244. [Google Scholar] [CrossRef]
  96. Friedman, J.H. Stochastic gradient boosting. Comput. Stat. Data Anal. 2002, 38, 367–378. [Google Scholar] [CrossRef]
  97. Mukaka, M.M. Statistics corner: A guide to appropriate use of correlation coefficient in medical research. Malawi Med. J. 2012, 24, 69–71. [Google Scholar]
  98. Barua, S.; Islam, M.M.; Murase, K. A Novel Synthetic Minority Oversampling Technique for Imbalanced Data Set Learning. In Neural Information Processing. ICONIP 2011. Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2011; Volume 7063. [Google Scholar]
  99. Browne, M.W. Cross-Validation Methods. J. Math. Psychol. 2000, 44, 108–132. [Google Scholar] [CrossRef] [Green Version]
  100. Wu, J.; Chen, X.-Y.; Zhang, H.; Xiong, L.-D.; Lei, H.; Deng, S.-H. Hyperparameter Optimization for Machine Learning Models Based on Bayesian Optimization. J. Electron. Sci. Technol. 2019, 17, 26–40. [Google Scholar]
  101. Zahedi, L.; Mohammadi, F.; Rezapour, S.; Ohland, M.W.; Amini, M.H. Search Algorithms for Automated Hyper-Parameter Tuning. arXiv Prepr. 2021, arXiv:2104.14677. [Google Scholar]
  102. Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4 December 2017; pp. 4768–4777. [Google Scholar]
  103. Rousset, F. GENEPOP′007: A complete re-implementation of the GENEPOP software for Windows and Linux. Mol. Ecol. Resour. 2008, 8, 103–106. [Google Scholar] [CrossRef]
  104. Duffy, V.B.; Davidson, A.C.; Kidd, J.R.; Kidd, K.K.; Speed, W.C.; Pakstis, A.J.; Reed, D.R.; Snyder, D.J.; Bartoshuk, L.M. Bitter Receptor Gene (TAS2R38), 6-n-Propylthiouracil (PROP) Bitterness and Alcohol Intake. Alcohol. Clin. Exp. Res. 2004, 28, 1629–1637. [Google Scholar] [CrossRef] [Green Version]
  105. Althnian, A.; AlSaeed, D.; Al-Baity, H.; Samha, A.; Dris, A.B.; Alzakari, N.; Abou Elwafa, A.; Kurdi, H. Impact of Dataset Size on Classification Performance: An Empirical Evaluation in the Medical Domain. Appl. Sci. 2021, 11, 796. [Google Scholar] [CrossRef]
Figure 1. Graphic diagram representing the study design.
Figure 1. Graphic diagram representing the study design.
Nutrients 14 00252 g001
Figure 2. Linear correlation analysis between the numerical features in the dataset. The bar color on the right-hand side on the Y-axis represents the value of linear correlation between features, with −1 indicating total negative linear correlation, 0 indicating no linear correlation, and 1 indicating total positive linear correlation. p-Values are indicated inside each square: significant values in black; non-significant values in white.
Figure 2. Linear correlation analysis between the numerical features in the dataset. The bar color on the right-hand side on the Y-axis represents the value of linear correlation between features, with −1 indicating total negative linear correlation, 0 indicating no linear correlation, and 1 indicating total positive linear correlation. p-Values are indicated inside each square: significant values in black; non-significant values in white.
Nutrients 14 00252 g002
Figure 3. Scatterplots of the NT (green circles), ST (blue circles), and MT (orange circles) samples of combined features derived via PCA before (A) and after (B) SMOTE. Synthetic samples generated by SMOTE in minority categories are shown (dark blue: ST; dark green: NT). The X- and Y-axes in each graph represent combinations of all features used in the experiments.
Figure 3. Scatterplots of the NT (green circles), ST (blue circles), and MT (orange circles) samples of combined features derived via PCA before (A) and after (B) SMOTE. Synthetic samples generated by SMOTE in minority categories are shown (dark blue: ST; dark green: NT). The X- and Y-axes in each graph represent combinations of all features used in the experiments.
Nutrients 14 00252 g003
Figure 4. The ROC curve and the AUC of the CatBoost model. The ROC curve is made by plotting the rate of true positives as a function of the rate of false positives. The black line represents the correct predictions of the STs, the light blue line represents the correct predictions of the MTs, and the yellow line represents the correct predictions of the NTs. Micro- and macro-average ROC curves are also represented by the dotted pink line and the dark blue dotted line, respectively.
Figure 4. The ROC curve and the AUC of the CatBoost model. The ROC curve is made by plotting the rate of true positives as a function of the rate of false positives. The black line represents the correct predictions of the STs, the light blue line represents the correct predictions of the MTs, and the yellow line represents the correct predictions of the NTs. Micro- and macro-average ROC curves are also represented by the dotted pink line and the dark blue dotted line, respectively.
Nutrients 14 00252 g004
Figure 5. Feature importance of the CatBoost classifier in the training set. The X-axis represents the average impact on the model output, while the Y-axis represents the order of importance of the features in the training set to understanding the categories of each PROP taster status.
Figure 5. Feature importance of the CatBoost classifier in the training set. The X-axis represents the average impact on the model output, while the Y-axis represents the order of importance of the features in the training set to understanding the categories of each PROP taster status.
Nutrients 14 00252 g005
Figure 6. SHAP summary plot of the ST category. The left-hand side of the Y-axis represents the descending order of importance of the ST category features; the X-axis represents the impact of the SHAP value on the output model. The color represents the feature value: high values have a pink color, while low values have a blue one.
Figure 6. SHAP summary plot of the ST category. The left-hand side of the Y-axis represents the descending order of importance of the ST category features; the X-axis represents the impact of the SHAP value on the output model. The color represents the feature value: high values have a pink color, while low values have a blue one.
Nutrients 14 00252 g006
Figure 7. SHAP summary plot of the MT category. The left-hand side of the Y-axis represents the descending order of importance of the MT category features; the X-axis represents the impact of the SHAP value on the output model. The color represents the feature value: high values have a pink color, while low values have a blue one.
Figure 7. SHAP summary plot of the MT category. The left-hand side of the Y-axis represents the descending order of importance of the MT category features; the X-axis represents the impact of the SHAP value on the output model. The color represents the feature value: high values have a pink color, while low values have a blue one.
Nutrients 14 00252 g007
Figure 8. SHAP summary plot of the NT category. The left-hand side of the Y-axis represents the descending order of importance of the NT category features; the X-axis represents the impact of the SHAP value on the output model. The color represents the feature value: high values have a pink color, while low values have a blue one.
Figure 8. SHAP summary plot of the NT category. The left-hand side of the Y-axis represents the descending order of importance of the NT category features; the X-axis represents the impact of the SHAP value on the output model. The color represents the feature value: high values have a pink color, while low values have a blue one.
Nutrients 14 00252 g008
Table 1. Demographic, clinical, morphological, genetic, and sensory features of subjects according to PROP taster status.
Table 1. Demographic, clinical, morphological, genetic, and sensory features of subjects according to PROP taster status.
FeaturesOverallST (n = 16)MT (n = 51)NT (n = 17)
Age (year)25.07 ± 0.4624.94 ± 1.0724.95 ± 0.6025.59 ± 1.03
BMI (kg/m2)21.82 ± 0.3620.83 ± 0.8222.27 ± 0.4621.41 ± 0.79
Papilla density/cm230.63 ± 1.5839.04 ± 3.39 a30.98 ± 1.90 b21.68 ± 3.29 c
Male/female (n)35/495/1124/276/11
Smokers/non-smokers (n)19/653/1313/383/14
Genotypes
TAS2R38
PP/PA/AA (n)20/43/2110/6/0 x10/35/6 y0/2/15 z
Gustin gene
AA/AG/GG (n)49/29/69/4/327/22/213/3/1
Taste scores
Sweet 3.43 ± 0.073.37 ± 0.173.43 ± 0.093.47 ± 0.16
Salty 3.57 ± 0.083.81 ± 0.173.51 ± 0.103.53 ± 0.17
Sour 2.38 ± 0.102.44 ± 0.242.41 ± 0.132.23 ± 0.23
Bitter 3.21 ± 0.113.50 ± 0.263.15 ± 0.153.11 ± 0.25
Umami 1.32 ± 0.161.12 ± 0.381.43 ± 0.211.17 ± 0.37
TST12.79 ± 0.2213.19 ± 0.5012.80 ± 0.2812.41 ± 0.49
Overall TST13.92 ± 0.3114.25 ± 0.7213.94 ± 0.4013.53 ± 0.70
Values are means ± SE or number of subjects. Significant differences in papilla density are indicated by the letters a, b, and c (p ≤ 0.041; LSD test subsequent to one-way ANOVA), while differences in TAS2R38 genotype distribution are indicated by the letters x, y, and z (p < 0.0001; Fisher’s method). BMI: body mass index; PP: PAV/PAV; PA: PAV/AVI; AA: AVI/AVI; TST: total taste score; Overall TST: overall total taste score.
Table 2. Results of metrics to evaluate each classifier model.
Table 2. Results of metrics to evaluate each classifier model.
ClassifiersPrecisionRecallF1-Score
Logistic regression83%81%81%
Gradient boosting 90%86%87%
Decision trees92%90%91%
Random forests 96%95%95%
CatBoost97%95%96%
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Naciri, L.C.; Mastinu, M.; Crnjar, R.; Tomassini Barbarossa, I.; Melis, M. Automated Classification of 6-n-Propylthiouracil Taster Status with Machine Learning. Nutrients 2022, 14, 252. https://doi.org/10.3390/nu14020252

AMA Style

Naciri LC, Mastinu M, Crnjar R, Tomassini Barbarossa I, Melis M. Automated Classification of 6-n-Propylthiouracil Taster Status with Machine Learning. Nutrients. 2022; 14(2):252. https://doi.org/10.3390/nu14020252

Chicago/Turabian Style

Naciri, Lala Chaimae, Mariano Mastinu, Roberto Crnjar, Iole Tomassini Barbarossa, and Melania Melis. 2022. "Automated Classification of 6-n-Propylthiouracil Taster Status with Machine Learning" Nutrients 14, no. 2: 252. https://doi.org/10.3390/nu14020252

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop