Next Article in Journal
Variation in Flavonoid Compounds, Volatiles and Yield Related Traits in Different Iranian Rosa damascena Mill. Cultivars Based on SPME Arrow and LC-MS/MS
Previous Article in Journal
Wild Blackberry Fruit (Rubus fruticosus L.) as Potential Functional Ingredient in Food: Ultrasound-Assisted Extraction Optimization, Ripening Period Evaluation, Application in Muffin, and Consumer Acceptance
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Development of a Machine Learning Model for Classifying Cooking Recipes According to Dietary Styles

1
National Institute of Health and Nutrition, National Institutes of Biomedical Innovation, Health and Nutrition, Osaka 566-0002, Japan
2
Oishi Kenko Inc., Tokyo 103-0024, Japan
3
Graduate School of Public Health, St. Luke’s International University, Tokyo 104-0045, Japan
*
Author to whom correspondence should be addressed.
Foods 2024, 13(5), 667; https://doi.org/10.3390/foods13050667
Submission received: 13 January 2024 / Revised: 16 February 2024 / Accepted: 20 February 2024 / Published: 22 February 2024
(This article belongs to the Section Sensory and Consumer Sciences)

Abstract

:
To complement classical methods for identifying Japanese, Chinese, and Western dietary styles, this study aimed to develop a machine learning model. This study utilized 604 features from 8183 cooking recipes based on a Japanese recipe site. The data were randomly divided into training, validation, and test sets for each dietary style at a 60:20:20 ratio. Six machine learning models were developed in this study to effectively classify cooking recipes according to dietary styles. The evaluation indicators were above 0.8 for all models in each dietary style. The top ten features were extracted from each model, and the features common to three or more models were employed as the best predictive features. Five well-predicted features were indicated for the following seasonings: soy sauce, miso (fermented soy beans), and mirin (sweet cooking rice wine) in the Japanese diet; oyster sauce and doubanjiang (chili bean sauce) in the Chinese diet; and olive oil in the Western diet. Predictions by broth were indicated in each diet, such as dashi in the Japanese diet, chicken soup in the Chinese diet, and consommé in the Western diet. The prediction model suggested that seasonings and broths could be used to predict dietary styles.

1. Introduction

The basic combination of traditional Japanese diets, known as washoku in Japanese, consists of cooked rice with one soup and three side dishes that make diets low-fat, low-energy, and well-balanced [1]. “Washoku, the traditional dietary cultures of the Japanese” was inscribed in UNESCO’s Representative List of the Intangible Cultural Heritage of Humanity in 2013 [2]. After the Great Kanto Earthquake of 1923, Chinese and Western cuisines have disseminated across the entire Japanese population, modifying various dishes into unique Japanese versions [3]. Today, Chinese and Western diets are familiar to the Japanese people in addition to traditional Japanese diet. According to the database of the National Health and Nutrition Survey, Japan, from 2003 to 2015, a study indicated a decrease in the dietary pattern of “plant food and fish,” which is usually classified in traditional diets, and an increase in the dietary pattern of “bread and dairy” and “animal food and oil”, which are usually included in the Western diet, suggesting continuous Westernization [4].
A systematic review of the Japanese diet indicated that the top three applicable categories were soy beans/soy bean-derived products, seafood, and vegetables, followed by rice and miso soup [5]. Miso is a paste made from molded rice, cooked soy beans, and salt [6]. Miso soup is composed of miso and Japanese broth, known as “dashi” in Japanese, which is usually made from kelp and dried bonito [7]. From the 1970s to the 1980s in Japan, fried Chinese noodles and dumplings from the Chinese diet and sandwiches, spaghetti, hamburgers, toasts, and cream stews from Western diets were gradually consumed as daily dishes [3].
The Japanese diet has been reported as one of the factors responsible for the longevity of Japanese people [8]. However, it is not known whether the traditional Japanese diet is superior to Japanese–Chinese and Japanese–Western diets in relation to longevity. Several studies have examined the relationships between dietary patterns and health-related indicators, including cancer [9], cardiovascular disease [10], and dementia [11]. There are several classical methods for identifying dietary patterns such as dietary quality scores, principal component analysis, factor analysis, clustering analysis, and reduced-rank regression [12]. A systematic review was previously conducted to examine the reproducibility of dietary patterns using principal component analysis [13]. The review reported that some major dietary patterns are relatively reproducible, but others are not found in different populations within a country. The interpretation of dietary styles should be carefully considered because the dietary styles for traditional methods were defined in each study.
Machine learning algorithms have recently been used in different areas of nutrition to complement current dietary pattern analyses, which may not integrate sufficient dietary variation [14]. Classifying pictures of food into categories is one way that machine learning could become a useful complementary method for improving the precision and validity of dietary measurements [14]. The systematic review reported that supervised learning algorithms were mostly used to assess food intake using a food frequency questionnaire [15]. The review selected 36 studies, out of which 23 used a classification algorithm. One of the studies used machine learning algorithms to predict a healthy diet based on food intake [16]. Another study clarified the specific food groups that can predict and classify adults with obesity and/or diabetes [17]. Yu et al. [18] used machine learning algorithms to determine food groups related to the incidence of bladder cancer. These previous reports demonstrate that text-based information related to dishes, such as cooking recipes, may also be applied to evaluate dietary styles.
Current Japanese dietary styles are diverse and challenging to classify. Even nutrition specialists, such as dieticians, do not have a standard for defining dietary styles. One review reported the difficulty in defining the Japanese diet because consistent definitions have not been established [19]. The present study refers to traditional Japanese, Japanese–Chinese, and Japanese–Western diets as Japanese, Chinese, and Western diets for good legibility. It is necessary to develop a complementary tool for classical methods to identify evidence-based dietary styles. Such a prediction model would support researchers in properly naming dietary patterns resulting from classical methods. Moreover, such a study can contribute to preserving the Japanese dietary style by identifying the understandable characteristics of this diet. Therefore, this study aimed to develop a machine learning model for classifying cooking recipes into Japanese, Chinese, and Western dietary styles in Japan.

2. Materials and Methods

2.1. Database

To build a dataset for the binary classification task in each dietary style, 9092 cooking recipes were collected from the “Oishi Kenko” app, supporting healthy dietary habits [20]. Among these, 909 recipes characterized by two or more dietary styles in one recipe were excluded, leaving 8183 cooking recipes representing Japanese, Chinese, or Western diets. The recipe examples can be found in Table S1. Each recipe’s dietary style was determined by two registered dieticians from a pool of ten within the company. Dietary style was classified by prioritizing dish name, photos, seasonings, and ingredients. The dieticians made comprehensive judgments considering annotation data and consistency with other recipes to assign the dietary style. In total, 27 annotations were utilized to characterize the recipes (Table 1), falling into four types. The first type covered various recipe characteristics (e.g., cooking type, cooking genre, main ingredients, arrangement type, main seasoning type, situation, suitable event, and basic or arrangement). The second focused on taste, flavor, and nutrients (e.g., taste characteristics, texture, nutrition point, smell characteristics, and nutritional value). The third outlined cooking methodologies (e.g., finishing cooking method, temperature, suitable time zone, estimated cooking time, season, easy point, necessary cooking utensils, and material). The fourth addressed considerations for individuals with health issues or dietary restrictions (e.g., infectious disease countermeasures, effects on the digestive system, trouble symptoms, cooking difficulty, and allergen-free). Nutritional and ingredient data were sourced from the Standard Tables of Food Composition in Japan 2015 (Seventh Revised Edition) [21], comprising 12 nutrients (e.g., energy, macronutrients, and micronutrients) and 19 ingredients (e.g., vegetables, fruits, and meat). Examples of these features are detailed in Table 1 for each dataset component. A total of 1547 explanatory features were initially processed, including 366 annotations, 50 nutrients, and 1131 ingredients. Following the exclusion of unavailable features, the final analysis included 604 features, which underwent one-hot encoding to convert categorical variables.

2.2. Statistical Analysis

The data were randomly divided into training data (60%), validation data (20%), and test data (20%) for each dietary style (Japanese, Chinese, and Western), maintaining the ratio of positive to negative data. The flow chart depicting the analyses is shown in Figure 1. We fine-tuned the parameters and trained the model to prevent overfitting and underfitting. Additionally, we assessed the model’s performance using test data that were not part of the model training process to ensure appropriate performance.
To extract important features that are both robust and specific to each machine learning algorithm, this study selected six machine learning models to which the Shapley additive explanations (SHAP) algorithm [22] can be applied and that can run calculations on the computer this study used. The following six machine learning models have been developed: a random forest classifier (RFC) [23], logistic regression (LR), support vector classifier (SVC) [24], extreme gradient boosting (XGB) [25], light gradient boosting machine (LGBM) [26], and deep neural network (DNN) [27]. The 4-fold cross-validation was suitable for evaluating the accuracy of the six learning models. Therefore, the hyperparameters of the model were determined by 4-fold cross-validation of the training data and a grid search. For data processing, the explanatory variables were standardized using means and standard deviations. The models were evaluated using four indices: accuracy (ACC), area under the receiver operating characteristic curve (AUC), F1-score, and Matthew’s correlation coefficient (MCC). The confusion matrix was constructed before performing calculations using the six models. The ACC was used to correctly assess the ability to differentiate between positive and negative results [28]. The equation is below:
T r u e   p o s i t i v e   ( T P )   = t h e   n u m b e r   o f   c a s e s   c o r r e c t l y   i d e n t i f i e d   a s   p o s i t i v e
F a l s e   p o s i t i v e   ( F P )   = t h e   n u m b e r   o f   c a s e s   i n c o r r e c t l y   i d e n t i f i e d   a s   p o s i t i v e
T r u e   n e g a t i v e   T N   = t h e   n u m b e r   o f   c a s e s   c o r r e c t l y   i d e n t i f i e d   a s   n e g a t i v e
F a l s e   n e g a t i v e   ( F N )   =   t h e   n u m b e r   o f   c a s e s   i n c o r r e c t l y   i d e n t i f i e d   a s   n e g a t i v e
A c c u r a c y   =   T P   + T N T P   +   T N   +   F P   + F N
The AUC was used to assess the classification performance of each model. AUC is the area under the receiver operating characteristic (ROC) curve. The x-axis in the ROC curve indicates t h e   f a l s e   p o s i t i v e   r a t e   1 s p e c i f i c i t y , and the y-axis indicates the true positive rate (sensitivity) [29].
F a l s e   p o s i t i v e   r a t e   1 s p e c i f i c i t y   =   F P F P   +   T N
T r u e   p o s i t i v e   r a t e   ( s e n s i t i v i t y )   = T P T P   +   F N
An AUC value close to 1 indicates better binary classification. The closer the ROC curve is to the upper-left corner of the graph, the higher the accuracy of the test, because in the upper-left corner, the false positive rate = 0 and the t r u e   p o s i t i v e   r a t e = 1.
The F1-score (range, 0–1) is defined as the harmonic mean of precision and recall, which has a trade-off relationship.
F 1 s c o r e   =   2 T P 2 T P   +   F P   +   F N = 2   ( p r e c i s i o n   ×   r e c a l l ) p r e c i s i o n   +   r e c a l l
The minimum F1-score is reached for T P   =   0 when all positive samples are misclassified. The maximum F1-score is reached for FP = FN = 0 when it is a perfect classification.
MCC is a special case of the ( p h i ) coefficient [30] for 2   ×   2 confusion matrices.
M C C = ( T P   ×   T N )     ( F P   ×   F N ) ( T P   +   F P )   ×   ( T P   +   F N )   ×   ( T N   +   F P )   ×   ( T N   +   F N )
An MCC close to + 1 indicates perfect classification for all other confusion matrix metrics, and 1 means the worst prediction, where all negative samples are predicted as positive, and vice versa [31].
The SHAP algorithm was applied to each model to calculate the correlation coefficient and identify the importance of each explanatory variable and its impact on the prediction [22]. A correlation analysis was not successfully performed in the SVC model because the model exhibited low reproducibility between the feature analysis and correlation. Important features were extracted for each dietary style as follows based on the calculated results: the top ten features were extracted from each model, and features common to half (i.e., three) or more of the models were used as well-predicted features to summarize the characteristics of the obtained results. The applicability of these models was confirmed in a previous study [32]. Python was used for the statistical analyses.

3. Results

Table 2 presents the evaluation of the six machine learning models used to classify cooking recipes into three dietary styles. The confusion matrix results of each dietary style are presented in the Supplementary Materials (Figures S1–S3). Accuracy, AUC, and F1-score exceeded 0.8 for all dietary types and models. The model with the highest average among the four evaluation indices for the six models was identified as the best model. The top performing models for each dietary type were LGBM for the Japanese diet, RFC for the Chinese diet, and DNN for the Western diet.
The ROC curves for all Japanese, Chinese, and Western dietary styles exhibited a trend toward the upper left, denoting high performance (Figures S4–S6). Similar trends were observed for the ROC curves of Japanese and Chinese dietary styles. For the Chinese dietary styles, the ROC curves of RFC, XGB, and LGBM were more prominently situated compared with those of other models, aligning with the trend of AUC scores.
Among the top ten features in the six models, five well-predicted features are highlighted in bold font in the Japanese diet (Table 3), Chinese diet (Table 4), and Western diet (Table 5). Three dietary styles exhibited positive correlations with specific seasonings: soy sauce, miso (fermented soy beans), and mirin (sweet-cooked rice wine) in the Japanese diet; oyster sauce and doubanjiang (chili bean sauce) in the Chinese diet; and olive oil in the Western diet. Broths emerged as strong predictors for each dietary style: dashi (and the flavor) for the Japanese diet, chicken broth for the Chinese diet, and consommé for the Western diet. Certain foods also predicted dietary styles: starch for the Chinese diet and dairy products, tomato, and garlic for the Western diet. Among the five items that predicted dietary styles, iodine was the only nutrient found in the Japanese diet.
Soy sauce in the Japanese diet appeared in five models, excluding the SVC model. In the Chinese diet, sesame oil, chicken broth, and oyster sauce were well-predicted features across all the six models. For the Western diet, olive oil was present in all the six models, whereas dairy products appeared in most models except the SVC model.

4. Discussion

This study developed a machine learning model to classify Japanese, Chinese, and Western dietary styles based on cooking recipe data, suggesting that seasonings and broths effectively differentiate between these dietary styles. To the best of our knowledge, this is the first study demonstrating the use of a machine learning model based on text features for identifying the three national dietary styles in Japan.
Six major dietary patterns, including Japanese and Western patterns, were identified in a systematic review analyzing 65 articles on national dietary patterns using the principal component procedure [13]. The Japanese pattern was characterized by higher intakes of mushrooms, seaweeds, potatoes, vegetables, pickles, pulses, seasonings, fruits, and fish and shellfish [13]. This study did not highlight these ingredients as the best practice features in the Japanese dietary pattern. However, a notable finding in our study is that only iodine in the Japanese diet was presented as a nutrient among the top five components of the three dietary styles. Iodine may reflect the use of seaweed and seafood in the Japanese diet [33]. The inclusion of seaweeds, fish, and shellfish in our results aligns with the findings in the review [13]. While the review mentioned seasoning as a characteristic of the Japanese diet, it did not provide detailed information on the type of seasoning [13]. The present study revealed that soy sauce was frequently presented as a well-predicted seasoning feature in the Japanese diet, making it easily associated with Japanese cuisine.
Interestingly, previous studies using dietary patterns did not identify a distinct Chinese dietary pattern [13]. The naming of each dietary pattern is usually based on the author’s perception during a principal component analysis [12]. The low significance of the author’s perception for distinguishing between Japanese and Chinese styles may be due to the similarity in ingredients and seasonings within these countries. However, our study identified robust features such as sesame oil, chicken broth, and oyster sauce in the Chinese diet. These tastes and flavors contribute to the identification of the Chinese diet. Additionally, our study revealed starch as a feature in the Chinese diet, with cornstarch (i.e., corn flour) commonly used in Chinese cooking for thickening soup and quick frying with corn flour [34]. Recognizing the classification of Chinese diets is essential, particularly if these characteristics are associated with non-communicable diseases. In a Chinese meta-analysis, the traditional Chinese dietary pattern, including starchy foods (i.e., rice, wheat, and tubers), vegetables, and high-protein foods (i.e., pork) was associated with a lower risk of overweight/obesity [35]. Although Chinese dietary styles have been adapted in Japan, the presence of Chinese diets within Japanese food culture should be acknowledged.
In the Western diet, olive oil was present in all six models, while dairy products appeared in most models in this study, except the SVC model. A high intake of olive oil and moderate intake of dairy products are associated with the Mediterranean diet, known for reducing the risk of cardiovascular disease and cancer and enhancing cognitive health [36]. Notably, the well-predicted features in the Western diet in this study included ingredients such as dairy products, tomatoes, and garlic. These items might contribute to the foundational taste of the Western diet owing to their glutamic acid content [37].
Unlike in a previous review [4], this study did not highlight protein-sourced foods as significant features in the Western diet. This previous review investigated 13-year trends in dietary patterns among Japanese adults aged over 20 years and revealed an increasing trend in the “animal food and oil” pattern, characterized by higher consumption of red and processed meat, eggs, vegetable oil, and other vegetables across most generations [4]. However, the recipe database used in this study prioritized healthy diets, and hence, red meat (such as beef and processed meat) was not frequently featured in the recipes.
This study identified the best model for each dietary style among the six models based on accuracy, AUC, F1-score, and MCC. While the SVC model proved effective in predicting features for each dietary style, it lacked a correlation analysis owing to low reproducibility between the feature analysis and correlation. Additionally, the best model (DNN) for the Western diet did not include consommé. Implementing ensemble methods combining results from several models can enhance the predictive performance [38]. Therefore, it is important to assess the comprehensive results by utilizing not just one (e.g., SVC or DNN models) but several suitable models.
The strength of this study lies in the extraction of explicit knowledge using a machine learning model from the implicit knowledge inherent in nutrition specialists’ dietary style classifications. However, several notable limitations exist. First, the feasibility of other databases remains unclear as this study relied on only one company’s database [8]. In Japan, various types of Japanese, Chinese, and Western dietary styles exist other than those used in the present data. More data sources should be introduced to demonstrate the robustness of the findings in the future. In addition, the dietary style of recipes used for the training data was determined by only two registered dieticians. Second, while the present model can generally identify dietary style characteristics, some aspects of its generalizability might be limited because the considered recipes focused on health considerations determined by dieticians. Third, this study excluded various cooking recipe types such as Korean and ethnic recipes, as well as their combinations with Japanese, Chinese, and Western diets. The current model focused solely on classifying cooking recipes into three major dietary styles, presenting a challenge for future studies aiming to accommodate diverse dietary styles.

5. Conclusions

This study developed a machine learning model that classifies cooking recipes into Japanese, Chinese, and Western dietary styles using a recipe database, indicating that seasonings and broths can effectively aid in such classifications. This study also proposed a complementary tool to investigate the dietary patterns within the Japanese population alongside classical methods. The evidence-based classification of dietary styles complemented by the prediction model contributes to clarifying the relationship between dietary styles and health.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/foods13050667/s1, Figure S1: The confusion matrix for the Japanese dietary style, Figure S2: The confusion matrix for the Chinese dietary style, Figure S3: The confusion matrix for the Western dietary style, Figure S4: ROC curves for the Japanese dietary style, Figure S5: ROC curves for the Chinese dietary style, Figure S6: ROC curves for the Western dietary style, Table S1: Cooking recipes in Japanese, Chinese, and Western dietary styles.

Author Contributions

Research conception and design: M.Y., M.A. and N.N.; provision of data: K.H. and T.N.; statistical analysis of the data: M.A.; interpretation of the data and results: M.Y., M.A., K.H., T.N. and N.N.; and writing of the manuscript: M.Y. and M.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Acknowledgments

The authors thank Keiichi Abe for setting up the study. The authors would also like to thank all dieticians who created the database for this study.

Conflicts of Interest

This research was conducted by the collaborative research agreement titled “Lifestyle Improvement Focusing on Japanese Diet” between the National Institutes of Biomedical Innovation, Health and Nutrition (NIBIOHN) and Oishi Kenko Inc. NIBIOHN did not receive any funds from Oishi Kenko Inc. Oishi Kenko Inc. provided recipe data for the NIBIOHN for the analysis.

References

  1. Gabriel, A.S.; Ninomiya, K.; Uneyama, H. The role of the Japanese traditional diet in healthy and sustainable dietary patterns around the world. Nutrients 2018, 10, 173. [Google Scholar] [CrossRef] [PubMed]
  2. Cang, V. Japan’s washoku as intangible heritage: The role of national food traditions in UNESCO’s cultural heritage scheme. Int. J. Cult. Prop. 2018, 25, 491–513. [Google Scholar] [CrossRef]
  3. Watanabe, Z. The transition of the Japanese-Style diet. Will Japan’s food culture become the World’s new macrobiotic diet and general health food? In The 2006 Kikkoman Food Culture Seminar; Kikkoman Corporation: Tokyo, Japan, 2006; pp. 1–5. Available online: https://www.kikkoman.com/jp/kiifc/foodculture/pdf_13/e_002_006.pdf (accessed on 10 January 2024).
  4. Murakami, K.; Livingstone, M.B.E.; Sasaki, S. Thirteen-year trends in dietary patterns among Japanese adults in the national health and nutrition survey 2003–2015: Continuous westernization of the Japanese diet. Nutrients 2018, 10, 994. [Google Scholar] [CrossRef] [PubMed]
  5. Suzuki, N.; Goto, Y.; Ota, H.; Kito, K.; Mano, F.; Joo, E.; Ikeda, K.; Inagaki, N.; Nakayama, T. Characteristics of the Japanese diet described in epidemiologic publications: A qualitative systematic review. J. Nutr. Sci. Vitaminol. 2018, 64, 129–137. [Google Scholar] [CrossRef]
  6. Stanton, W.R.; Owens, J.D. Fermented Foods: Fermentations of the far east. In Encyclopedia of Food Science and Nutrition, 2nd ed.; Caballero, B.F.P., Toldra, F., Eds.; Academic Press: Cambridge, MA, USA, 2003; pp. 2344–2351. [Google Scholar]
  7. Hajeb, P.; Jinap, S. Umami taste components and their sources in Asian foods. Crit. Rev. Food Sci. Nutr. 2015, 55, 778–791. [Google Scholar] [CrossRef] [PubMed]
  8. Ikeda, N.; Saito, E.; Kondo, N.; Inoue, M.; Ikeda, S.; Satoh, T.; Wada, K.; Stickley, A.; Katanoda, K.; Mizoue, T.; et al. What has made the population of Japan healthy? Lancet 2011, 378, 1094–1105. [Google Scholar] [CrossRef] [PubMed]
  9. Matsushita, M.; Fujita, K.; Nonomura, N. Influence of diet and nutrition on prostate cancer. Int. J. Mol. Sci. 2020, 21, 1447. [Google Scholar] [CrossRef] [PubMed]
  10. Shirota, M.; Watanabe, N.; Suzuki, M.; Kobori, M. Japanese-style diet and cardiovascular disease mortality: A systematic review and meta-analysis of prospective cohort studies. Nutrients 2022, 14, 2008. [Google Scholar] [CrossRef]
  11. Matsuyama, S.; Shimazu, T.; Tomata, Y.; Zhang, S.; Abe, S.; Lu, Y.; Tsuji, I. Japanese diet and mortality, disability, and dementia: Evidence from the Ohsaki cohort Study. Nutrients 2022, 14, 2034. [Google Scholar] [CrossRef]
  12. Zhao, J.; Li, Z.; Gao, Q.; Zhao, H.; Chen, S.; Huang, L.; Wang, W.; Wang, T. A review of statistical methods for dietary pattern analysis. Nutr. J. 2021, 20, 37. [Google Scholar] [CrossRef]
  13. Murakami, K.; Shinozaki, N.; Fujiwara, A.; Yuan, X.; Hashimoto, A.; Fujihashi, H.; Wang, H.C.; Livingstone, M.B.E.; Sasaki, S. A systematic review of principal component analysis-derived dietary patterns in Japanese adults: Are Major dietary patterns reproducible within a country? Adv. Nutr. 2019, 10, 237–249. [Google Scholar] [CrossRef]
  14. Morgenstern, J.D.; Rosella, L.C.; Costa, A.P.; de Souza, R.J.; Anderson, L.N. Perspective: Big data and machine learning could help advance nutritional epidemiology. Adv. Nutr. 2021, 12, 621–631. [Google Scholar] [CrossRef] [PubMed]
  15. Oliveira Chaves, L.; Gomes Domingos, A.L.; Louzada Fernandes, D.; Ribeiro Cerqueira, F.; Siqueira-Batista, R.; Bressan, J. Applicability of machine learning techniques in food intake assessment: A systematic review. Crit. Rev. Food Sci. Nutr. 2023, 63, 902–919. [Google Scholar] [CrossRef]
  16. Hearty, A.P.; Gibney, M.J. Analysis of meal patterns with the use of supervised data mining techniques--Artificial neural networks and decision trees. Am. J. Clin. Nutr. 2008, 88, 1632–1642. [Google Scholar] [CrossRef] [PubMed]
  17. Easton, J.F.; Román Sicilia, H.; Stephens, C.R. Classification of diagnostic subcategories for obesity and diabetes based on eating patterns. Nutr. Diet. 2019, 76, 104–109. [Google Scholar] [CrossRef]
  18. Yu, E.Y.W.; Wesselius, A.; Sinhart, C.; Wolk, A.; Stern, M.C.; Jiang, X.; Tang, L.; Marshall, J.; Kellen, E.; van den Brandt, P.; et al. A data mining approach to investigate food groups related to incidence of bladder cancer in the BLadder cancer Epidemiology and Nutritional Determinants International Study. Br. J. Nutr. 2020, 124, 611–619. [Google Scholar] [CrossRef] [PubMed]
  19. Sasaki, S. For Working Group 1 of the Healthy Diet Research Committee of International Life Sciences Institute, Japan. What is the scientific definition of the Japanese diet from the viewpoint of nutrition and health? Nutr. Rev. 2020, 78 (Suppl. S2), 18–26. [Google Scholar] [CrossRef]
  20. Oishi Kenko Inc. Oishi Kenko. Available online: https://oishi-kenko.com/ (accessed on 10 January 2024).
  21. Ministry of Education, Culture, Sports, Science, and Technology. Standard Tables of Food Composition in Japan, 7th Revised ed. 2015. Available online: https://www.mext.go.jp/en/policy/science_technology/policy/title01/detail01/sdetail01/sdetail01/1385122.htm (accessed on 10 January 2024).
  22. Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. In Proceedings of the 31st Conference on Neural Information Processing Systems; NIPS; Long Beach, CA, USA, 2017. [Google Scholar]
  23. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  24. Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
  25. Chen, T.Q.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Fransisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
  26. Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. LightGBM: A highly efficient gradient boosting decision tree. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 3149–3157. [Google Scholar]
  27. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
  28. Baratloo, A.; Hosseini, M.; Negida, A.; El Ashal, G. Part 1: Simple definition and calculation of accuracy, sensitivity and specificity. Emergency 2015, 3, 48–49. [Google Scholar] [PubMed]
  29. Nahm, F.S. Receiver operating characteristic curve: Overview and practical use for clinicians. Korean J. Anesthesiol. 2022, 75, 25–36. [Google Scholar] [CrossRef] [PubMed]
  30. Guilford, J.P. The minimal phi coefficient and the maximal phi. Educ. Psychol. Meas. 1965, 25, 3–8. [Google Scholar] [CrossRef]
  31. Matthews, B.W. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta (BBA) Protein Struct. 1975, 405, 442–451. [Google Scholar] [CrossRef]
  32. Martin-Morales, A.; Yamamoto, M.; Inoue, M.; Vu, T.; Dawadi, R.; Araki, M. Predicting cardiovascular disease mortality: Leveraging machine learning for comprehensive assessment of health and nutrition variables. Nutrients 2023, 15, 3937. [Google Scholar] [CrossRef] [PubMed]
  33. Kikuchi, Y.; Takebayashi, T.; Sasaki, S. Iodine concentration in current Japanese foods and beverages. Nihon Eiseigaku Zasshi (Jpn. J. Hyg.) 2008, 63, 724–734. (In Japanese) [Google Scholar] [CrossRef] [PubMed]
  34. Zhang, N.; Ma, G. Nutritional characteristics and health effects of regional cuisines in China. J. Ethn. Foods 2020, 7, 7. [Google Scholar] [CrossRef]
  35. Jiang, K.; Zhang, Z.; Fullington, L.A.; Huang, T.T.; Kaliszewski, C.; Wei, J.; Zhao, L.; Huang, S.; Ellithorpe, A.; Wu, S.; et al. Dietary patterns and obesity in Chinese adults: A systematic review and meta-analysis. Nutrients 2022, 14, 4911. [Google Scholar] [CrossRef]
  36. Davis, C.; Bryan, J.; Hodgson, J.; Murphy, K. Definition of the Mediterranean diet; a literature review. Nutrients 2015, 7, 9139–9153. [Google Scholar] [CrossRef]
  37. Klosse, P. Umami in wine. Res. Hosp. Manag. 2013, 2, 25–28. [Google Scholar] [CrossRef]
  38. Sagi, O.; Rokach, L. Ensemble learning: A survey. WIREs Data Min. Knowl. 2018, 8, e1249. [Google Scholar] [CrossRef]
Figure 1. Flow chart of data analyses. a The following six models were used: a random forest classifier, logistic regression, support vector classifier, extreme gradient boosting, light gradient boosting machine, and deep neural network. b The six models were assessed using four indices: accuracy, area under the receiver operating characteristic curve, F1-score, and Matthew’s correlation coefficient.
Figure 1. Flow chart of data analyses. a The following six models were used: a random forest classifier, logistic regression, support vector classifier, extreme gradient boosting, light gradient boosting machine, and deep neural network. b The six models were assessed using four indices: accuracy, area under the receiver operating characteristic curve, F1-score, and Matthew’s correlation coefficient.
Foods 13 00667 g001
Table 1. Features of the database.
Table 1. Features of the database.
DataComponentsExamplesn
Annotation dataCooking typeStaple food, side dish, main dish12
Cooking genreKorean food, ethnic food, others6
Finishing cooking methodFry, bake, steam9
Main ingredientsMeat, vegetables, milk and dairy products26
Arrangement typeCalcium fortification, diets for morning sickness, Dysphagia diet16
Main seasoning typeConsommé, sweetener (sugar, mirin, honey), miso, sauce (Worcestershire sauce), other seasoning, dashi15
Taste characteristicsDashi flavor, sesame flavor, soy sauce taste, salty14
TextureNo stimulation to oral cavity, easy to swallow, thicken15
TemperatureRoom temperature, hot, very cold5
Suitable time zoneAnytime, for lunch, for breakfast5
Estimated cooking timeWithin 5 min, within 15 min, within one hour cooking7
SeasonThroughout a year, spring, summer5
Easy pointOnly toaster oven, easy cooking, few cooking steps25
Nutrition pointSalt-free, “Diet” in the title, “Healthy” in the title12
Smell characteristicsNo protein smell, easy to detect the smell of the ingredients2
SituationFor lunch box, for party, easy one-item lunch7
Necessary cooking utensilsWooden pestle, oven, food processor15
Infectious disease countermeasuresAvoid infections, need caution about listeria food poisoning, need caution about the growth of bacteria3
Suitable eventNew year (Osechi: traditional Japanese diets), Christmas, Valentine’s Day6
MaterialInclude dairy products, include tomato, include garlic, include potato, include green onion80
Nutrition valueInclude foods with no measurement of potassium, include caffeine, very low fat and/or energy percent34
Effects on the digestive systemGood for digestion, adjust the intestinal environment, less likely to generate intestinal gas2
Basic or arrangementBasic, arrangement2
Trouble symptomsComplementary food for nutrition supply, abdominal distension, less force required for arms or hands31
Cooking methodIncluding frying process1
Cooking difficultyBeginner, intermediate, advanced4
Allergen-freeAllergen-free of pork, allergen-free of sesame, allergen-free of apple7
Nutrients (unit/dish) EnergyEnergy (kcal)1
ProteinProtein (g)1
Amino acidAmino acid composition (g)1
FatFat (g), triacylglycerol (g)2
Fatty acidSaturated fatty acid (g), monounsaturated fatty acid (g), polyunsaturated fatty acid (mg)3
CholesterolCholesterol (g)1
CarbohydrateCarbohydrate (g), available carbohydrate (g)2
FiberTotal fiber (mg), soluble fiber (g), insoluble fiber (g)3
MineralIodine (μg), sodium (mg), calcium (mg)13
VitaminVitamin C (mg), gamma-tocopherol (mg), pantothenic acid (μg)21
WaterWater (g)1
AshAsh (g)1
Ingredients CerealsBrown rice, rice cake (mochi), pasta76
Potatoes and starchesSweet potato, potato, starch29
Sugars and sweetenersSuperfine sugar, honey, brown sugar16
PulsesGreen beans, green beans, soy beans42
Nuts and seedsWalnuts, sesame, peanuts25
VegetablesPurple onion, parsley, cabbage186
FruitsApple, banana, strawberry78
MushroomsShiitake mushroom, enoki mushroom, eryngii mushroom20
AlgaeEdible brown algae (hijiki), kelp, Wakame seaweed30
Fish, mollusks, and crustaceansHorse mackerel, mackerel, shrimp157
MeatPork, beef, chicken92
EggsChicken eggs, quail eggs, silky eggs10
Milk and dairy productsMilk, yogurt, cheese28
Fats and oilsOlive oil, sesame oil, rapeseed oil14
ConfectionariesDonuts, jelly, cookies11
BeveragesRice wine, whiskey, coffee31
Seasonings and spicesPepper, mirin, doubanjiang, oyster sauce, chicken broth100
Prepared foodsGyoza (frozen), fried squid (for frying, frozen), curry (beef, retort pouch)8
Original ingredientsMCT oil, bonito flake, protein powder178
Nutritional and ingredient data were referenced using the Standard Tables of Food Composition in Japan 2015 (Seventh Revised Edition).
Table 2. Assessments of the six machine learning models in terms of predicting dietary styles.
Table 2. Assessments of the six machine learning models in terms of predicting dietary styles.
Dietary StyleModelsAccuracyAUCF1-ScoreMCC
Japanese dietRFC0.860.930.860.71
LR0.860.930.860.71
SVC0.860.930.860.72
XGB0.880.940.880.75
LGBM0.880.940.880.76
DNN0.860.940.860.72
Chinese dietRFC0.950.950.840.68
LR0.910.930.790.61
SVC0.930.930.810.63
XGB0.940.950.830.66
LGBM0.930.960.830.67
DNN0.890.910.770.56
Western dietRFC0.880.950.870.75
LR0.890.950.880.77
SVC0.890.950.880.77
XGB0.890.960.880.77
LGBM0.890.960.880.76
DNN0.900.950.890.78
AUC, area under the curve; DNN, deep neural network; LGBM, light gradient boosting machine; LR, logistic regression; MCC, Matthew’s correlation coefficient; RFC, random forest classifier; SVC, support vector classifier; XGB, extreme gradient boosting.
Table 3. Top 10 among the 604 features in the six machine learning models in the Japanese dietary style.
Table 3. Top 10 among the 604 features in the six machine learning models in the Japanese dietary style.
RFC LR SVC XGB LGBM a DNN
Features +/− bFeatures+/− Features+/− Features+/− Features+/− Features+/−
Include dairy productsSoy sauce taste +AnytimeN.A.Soy sauce taste +Soy sauce taste +Soy sauce taste +
Soy sauce taste +Chicken brothChicken brothOlive oilOlive oilChicken broth
Olive oilConsomméConsomméInclude dairy productsInclude dairy productsSweetener (sugar, mirin, honey)+
Iodine +Olive oilKorean foodMiso +Miso +No stimulation to oral cavity+
Mirin +Dashi flavor +Sauce (Worcestershire sauce)Iodine +Mirin +Other seasoning
Chicken brothSweetener (sugar, mirin, honey)+Purple onionPepperDashi flavor +Include foods with no measurement of potassium
PepperInclude foods with no measurement of potassiumOyster sauceDashi flavor+Dashi +Consommé
Include tomatoPepperPepperMirin +PepperSoy sauce +
ConsomméMiso +Ethnic foodChicken brothInclude tomatoMiso +
Dashi +Allergen-free of pork+Olive oilNo stimulation to oral cavity+Iodine +Room temperature+
DNN, deep neural network; LGBM, light gradient boosting machine; LR, logistic regression; N.A., not available; RFC, random forest classifier; SVC, support vector classifier; XGB, extreme gradient boosting. a The best model for the Japanese diet was the LGBM. b +: positive correlation, −: negative correlation. The correlation coefficient was analyzed using the Shapley additive explanations. well-predicted features.
Table 4. Top 10 among the 604 features in the six machine learning models in the Chinese dietary style.
Table 4. Top 10 among the 604 features in the six machine learning models in the Chinese dietary style.
RFC a LR SVC XGB LGBM DNN
Features +/− b Features+/− Features+/− Features+/− Features+/− Features+/−
Sesame oil +Chicken broth +Chicken broth N.A.Sesame oil +Sesame oil +Chicken broth +
Chicken broth +Oyster sauce +Oyster sauce Chicken broth +Chicken broth +Sesame oil +
Oyster sauce +Sesame oil +Sesame oil Oyster sauce +Oyster sauce +Sesame flavor +
Starch +Allergen-free of sesame +Diets for morning sicknessStarch+Olive oilOyster sauce +
Gamma-tocopherol+Sesame flavor+For partyDoubanjiang +MirinAllergen-free of sesame
Doubanjiang +Include potato+Include green onionMirinIodineFor lunch+
Allergen-free of sesameInclude green onion+Doubanjiang Allergen-free of sesameDoubanjiang +Include potato+
Fry+Fry+Sesame flavorInclude dairy productsStarch +Side dish
Sodium+Sauce (Worcestershire sauce)+For breakfastOlive oilPantothenic acid+Other seasoning+
MirinThroughout a year+Sauce (Worcestershire sauce)IodineEasy cooking+Very low fat and/or energy percent
DNN, deep neural network; LGBM, light gradient boosting machine; LR, logistic regression; N.A., not available; RFC, random forest classifier; SVC, support vector classifier; XGB, extreme gradient boosting. a The best model for the Chinese diet was the RFC. b +: positive correlation, −: negative correlation. The correlation coefficient was analyzed using Shapley additive explanations. well-predicted features.
Table 5. Top 10 of the 604 features in the six machine learning models in the Western dietary style.
Table 5. Top 10 of the 604 features in the six machine learning models in the Western dietary style.
RFC LR SVC XGB LGBM DNN a
Features +/− bFeatures+/− Features+/− Features+/− Features+/− Features+/−
Include dairy products +Olive oil +Ethnic foodN.A.Include dairy products +Olive oil +Olive oil +
Olive oil +Include dairy products +Olive oil Olive oil +Include dairy products +Include dairy products +
Soy sauce tasteInclude tomato +Consommé Soy sauce tasteVitamin C+Include tomato +
Sesame oilConsommé +Within one hour cookingSesame oilSoy sauce tasteSoy sauce taste
Milk and dairy products +Soy sauce tasteThickenVitamin C+Sesame oilPolyunsaturated fatty acids
Consommé +Low fat energy percentInclude garlic Consommé +Consommé +Allergen-free of apple
Soy sauceMilk and dairy products +Include tomato Include tomato +Include tomato +Salty+
Gamma tocopherolEthnic foodWooden pestleRice wineRice wineVery low fat and/or energy percent
Rice wineInclude garlic +PastaParsley+Parsley+Include garlic +
Sesame flavorNo stimulation to oral cavityCalcium fortificationMisoMisoCaution about germ growth+
DNN, deep neural network; LGBM, light gradient boosting machine; LR, logistic regression; N.A., not available; RFC, random forest classifier; SVC, support vector classifier; XGB, extreme gradient boosting. a The best model for the Western diet was the DNN. b +: positive correlation, −: negative correlation. The correlation coefficient was analyzed using Shapley additive explanations. well-predicted features.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yamaguchi, M.; Araki, M.; Hamada, K.; Nojiri, T.; Nishi, N. Development of a Machine Learning Model for Classifying Cooking Recipes According to Dietary Styles. Foods 2024, 13, 667. https://doi.org/10.3390/foods13050667

AMA Style

Yamaguchi M, Araki M, Hamada K, Nojiri T, Nishi N. Development of a Machine Learning Model for Classifying Cooking Recipes According to Dietary Styles. Foods. 2024; 13(5):667. https://doi.org/10.3390/foods13050667

Chicago/Turabian Style

Yamaguchi, Miwa, Michihiro Araki, Kazuki Hamada, Tetsuya Nojiri, and Nobuo Nishi. 2024. "Development of a Machine Learning Model for Classifying Cooking Recipes According to Dietary Styles" Foods 13, no. 5: 667. https://doi.org/10.3390/foods13050667

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop