Next Article in Journal
Changes in the Physical Properties and Volatile Odor Characteristics of Shiitake Mushrooms (Lentinula edodes) in Far Infrared Radiation Drying
Next Article in Special Issue
Quantification and Detection of Ground Garlic Adulteration Using Fourier-Transform Near-Infrared Reflectance Spectra
Previous Article in Journal
Study of the Total Antioxidant Capacity (TAC) in Native Cereal−Pulse Flours and the Influence of the Baking Process on TAC Using a Combined Bayesian and Support Vector Machine Modeling Approach
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Multispectral Food Classification and Caloric Estimation Using Convolutional Neural Networks

Department of Electrical and Electronic Engineering, Konkuk University, 1 Hwayang-dong, Gwangjin-gu, Seoul 05029, Republic of Korea
Foods 2023, 12(17), 3212; https://doi.org/10.3390/foods12173212
Submission received: 25 July 2023 / Revised: 18 August 2023 / Accepted: 24 August 2023 / Published: 25 August 2023

Abstract

:
Continuous monitoring and recording of the type and caloric content of ingested foods with a minimum of user intervention is very useful in preventing metabolic diseases and obesity. In this paper, automatic recognition of food type and caloric content was achieved via the use of multi-spectral images. A method of fusing the RGB image and the images captured at ultra violet, visible, and near-infrared regions at center wavelengths of 385, 405, 430, 470, 490, 510, 560, 590, 625, 645, 660, 810, 850, 870, 890, 910, 950, 970, and 1020 nm was adopted to improve the accuracy. A convolutional neural network (CNN) was adopted to classify food items and estimate the caloric amounts. The CNN was trained using 10,909 images acquired from 101 types. The objective functions including classification accuracy and mean absolute percentage error (MAPE) were investigated according to wavelength numbers. The optimal combinations of wavelengths (including/excluding the RGB image) were determined by using a piecewise selection method. Validation tests were carried out on 3636 images of the food types that were used in training the CNN. As a result of the experiments, the accuracy of food classification was increased from 88.9 to 97.1% and MAPEs were decreased from 41.97 to 18.97 even when one kind of NIR image was added to the RGB image. The highest accuracy for food type classification was 99.81% when using 19 images and the lowest MAPE for caloric content was 10.56 when using 14 images. These results demonstrated that the use of the images captured at various wavelengths in the UV and NIR bands was very helpful for improving the accuracy of food classification and caloric estimation.

1. Introduction

Precise and continuous monitoring of the types and amounts of foods consumed is very helpful for the maintenance of good health. For health professionals, being aware of the nutritional content of ingested food plays an important role in the proper treatment of patients with weight-related diseases and gastrointestinal diseases as well as those at high risk for metabolic diseases such as obesity [1]. For people with no health problems, monitoring the types, amounts, and nutritional content of food consumed is useful in order to maintain this status. Monitoring of the types and amounts of foods eaten is often achieved via manual record-keeping methods that include food-frequency questionnaires [2], self-report diaries [3], and multimedia diaries [4]. Several user-friendly diet-related apps have become available on smartphones in which image recognition schemes are in part adopted to classify the types of food. In such an approach, however, the accuracy is affected by user inattention and erroneous record-keeping that often decreases usefulness.
Several automatic food recognizers (AFRs) are available to continuously recognize the types and amounts of consumed food with minimum user intervention required. Wearable sensing and digital signal-processing technologies are key factors in the implementation of AFRs which are divided into several categories according to the adopted sensing method. In acoustic-based methods, classifying the types of food is achieved via chewing and swallowing sounds. The underlying principle is that chewing sounds vary depending on the physical characteristics of the food, which includes shape, hardness, and moisture content. In-ear microphone [5,6,7] and throat microphone [8,9] are typically used to acquire the sounds of food intake. By using a throat microphone, recognition experiments were carried out on seven types of food [9]. A recognition rate of 81.5∼90.1% was achieved where a hidden Markov model (HMM) was adopted to classify the types of food [9]. Päfiler et al. performed recognition experiments on seven types of food using an in-ear microphone and reported a recognition rate of 79∼66%. The performance achieved using acoustic signals has been limited, however, because it is difficult to discriminate various foods by using only acoustic cues.
A variety of sensors are used in non-acoustic methods. These include an imaging sensor [10], a visible light spectrometer, a conductive sensor [11], a surface electromyogram (sEMG) sensor attached to the frame of eyeglasses [12], and a near-infrared (NIR) spectrometer [13]. These methods have the advantages of distinguishing and sub-dividing different types of food while analyzing the principle constituents. Sensors that are inconvenient to wear, however, can be disadvantageous and separate sampling of the food is required [11,13]. Ultrasonic Doppler shifts are also employed to classify the types of food [14]. The underlying principle of non-acoustic methods involves movements of the jaw during the chewing of food, as well as vibrations of the jaw caused by the crushing of food, both of which reflect the characteristics of food types. The accuracy of the ultrasonic Doppler method was 90.13% for the six types of food [14].
Since types of food are easily distinguished according to their shape, texture, and color, visual cues have been used for the classification of food types and estimation of the food amount [10,15,16,17,18,19,20,21,22,23]. In a vision-based approach, the classification of food types can be formulated as pattern recognition problems where segmentation, feature selection and classification are sequentially carried out for food images. Due to recent advances in machine learning technology, an artificial neural network has been employed to classify food categories and to predict the caloric content of food [18,22,23]. Convolutional neural networks (CNNs) have been used to established 15 food categories with an average classification accuracy of 82.5% and a correlation between the true and estimated calories of 0.81 [18]. When RGB images under visible lighting sources were used in previous vision-based approaches, recognition accuracy was degraded for visually similar foods. Moreover, the lack of a specific reaction to UV and NIR lights is a limitation meaning that this technology cannot be utilized in food analysis.
Multispectral analysis has been widely adopted in food analysis [24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39]. The underlying principle is that individual ingredients in food have different absorption spectra. For example, infrared (IR) light is strongly absorbed by water compared with ultraviolet (UV) and visible (VIS) light. Therefore, differences in the absorption spectra between the VIS and IR bands are useful in estimating the amount of water contained in food. In previous studies, multispectral analysis of food was employed to quantify specific contents, such as oil, vinegar [24], water [27], sugar [28,29,30,31,32,33,34,35,36], and soluble protein [37]. The multispectral analysis involved the use of a spectrometer and a wide-band light source (such as a halogen lamp). Using these methods, an optimal set of wavelengths was chosen from the absorption spectra in the interest of maximizing the prediction accuracy for the ingredients of interest. A correlation of 0.8912 was obtained using four wavelengths out of a 280-bin absorption spectrum when predicting the sugar content of apples [28]. When hyperspectral imaging was adopted to predict the sugar content in wine, a maximum correlation of 0.93 was obtained when using partial least squares regression [32]. The usage of a spectrometer and a wide-band light source confers the ability to select the optimal wavelength in sharp detail. Problems associated with size, weight, and power consumption, however, could potentially cause difficulties in implementing wearable monitoring devices.
The multispectral approach has also been implemented by using a number of narrow-bandwidth light sources, such as light emitting diodes (LEDs), and a digital camera [24,26,27,31,35,39]. Compared with halogen lamps, the improvement gained when using an LED light source was verified for multispectral food analysis [31]. Experimental results obtained showed that use of an LED light source returned a slightly higher correlation than that of halogen lamps (0.78 vs. 0.77). The number of employed wavelengths ranged from 5 [39] to 28 for UV, VIS, and NIR bands [31]. Raju et al. [24] used the multispectral images from 10 LEDs with different wavelengths to detect dressing oil and vinegar on salad leaves, and reported an accuracy of 84.2% by using five LEDs. Previous studies were focused mainly on predicting the specific nutritional content of specific foods (e.g., water in beef [27], sugar in apples [28], sugar in sugarcane [29], sugar in peaches [30], soluble solids in pomegranates [33], sugar in potatoes [34], sugar in black tea [35], and soluble protein in oilseed rape leaves [37]). The caloric content of food is determined by the amounts of each ingredient, and it would be reasonable to expect accurate predictions when using multispectral analysis techniques.
In the present study, multi-wavelength imaging techniques were applied to classifying food items and estimating caloric content. Compared with conventional RGB image-based methods, the usefulness of NIR/UV images was experimentally verified for food classification and predictions of caloric content. The optimal number and combination of the wavelengths was determined to maximize the estimation accuracy where a piecewise selection method was adopted.
This paper is organized as follows: In Section 2, the processing of data preparation and the properties of the employed food items are presented. The preliminary verification results and the overall procedure for food analysis are explained in Section 3. The experimental results and discussion of the results are presented in Section 4. Concluding remarks are provided in Section 5.

2. Data Acquisition

The list of the food items used in this study is presented in Table 1. The food items were selected to represent the various physical properties (liquid/soft/hard) of everyday foods and to reflect the naturally occurring balances between healthy and unhealthy foods. The caloric amount was obtained from existing nutrition fact tables for each food and food composition data released by the Ministry of Korea Food and Drug Safety (KFDA) [40]. It is noteworthy that a number of foods were nutritionally different but were difficult to distinguish visually (e.g., cider and water, coffee and coffee with sugar, tofu and milk pudding, milk soda and milk…). Such pairs of food items were good choices for verifying the feasibility of UV and NIR images in the recognition of the types and calories of food. In the case of liquid food, images were acquired by putting the same amount of food in the same container (cup) so that the shape of the cup and the amount of food were not used as a clue for food recognition. In a similar manner, in the case of non-liquid foods, plates of the same size and shape were used.
A custom-made image acquisition system was employed to obtain the multispectral images. A schematic of the image acquisition system appears in Figure 1. The photography is shown in Figure 2. The light source was positioned 25 cm above the food tray. Four digital cameras faced the center of the food tray and were connected to the desktop PC via a universal serial bus (USB). The acquisition image size was set at 640 × 480 pixels (HV), and each pixel had a 16-bit resolution. Each camera was equipped with a motorized IR-cut filter to block the visible lights when the IR images were acquired. The light source consisted of a total of 20 types of LEDs emitting light at different wavelengths (385, 405, 430, 470, 490, 510, 560, 590, 625, 645, 660, 810, 850, 870, 890, 910, 950, 970, 1020 nm, and white). The white LED was used to obtain the RGB images, which were split into three (R-G-B) channels. The light source of each wavelength was composed of 30 LEDs with the exception of the white light source(10 LEDs). The LEDs of each wavelength were arranged in a circular shape at a specified position on the printed circuit board (PCB), as shown in Figure 3. Before the image of a specific wavelength was acquired, the center of the corresponding LED area was moved to the center of the food tray. Since the intensities of LEDs were different according to wavelength, the driving current of LEDs for each wavelength was adjusted to minimize the differences in light intensity according to the wavelength. The LED panel moved back-and-forth and left-to-right using the linear stages powered by the stepping motors. Data augmentation was achieved not by image transformations but by capturing images from as many views as possible. Accordingly, the four cameras and a rotating table were employed. The angular resolution of the rotating table could be adjusted from 0.5∼90 . The movement of the LED panel, rotation of the tables, and the on/off switch of each of the LEDs were all controlled by a micro-controller (Atmega128A).
The number of control commands was predefined both in the control module of the acquisition system and in the host desktop PC. Hence, the task of acquisition was achieved entirely by constructing the sequence of the individual commands. The acquisition code was written in Python (version 3.6.11). The communication between the desktop PC and the control module was achieved using Bluetooth technology. Image acquisition was carried out in a dark chamber (470 mm × 620 mm × 340 mm, WDH) where external light was blocked. The angular resolution of the rotating table was set at 10 (total 36 views per camera). The total acquisition time for each food was 2738 s, which corresponded to an acquisition time of 3.8 s per frame. The images of bread, castella, and a chocolate bar were captured under white light from specific angles, as shown in Figure 4.

3. Food Analysis

3.1. Preliminary Feasibility Tests for UV/NIR Images

The main objective of this study was to improve the accuracy of food classification and calorie estimation by using UV/NIR images. Prior to construction of the classification/estimation rules, the use of UV/NIR images was experimentally verified as adequate for this purpose. The white LED used for capturing RGB images was experimentally measured as emitting light with a wavelength that ranged from 430 to 640 nm. Hence, images acquired under a light source with a wavelength shorter than 430 nm were considered UV images, and the images acquired under a light source with a wavelength longer than 640 nm were considered NIR images. Two food items could not be well-distinguished visually under a visible light source, which necessitated confirming whether the corresponding images acquired under UV or NIR light sources could be relatively well-distinguished. Food images are visually distinguished according to shape, texture, and the distribution of brightness values (histogram). The shapes and textures of foods, however, are uniquely determined independent of the light source, whereas the distribution of brightness values is affected not only by food types but also by the wavelength of the light source. Accordingly, brightness distribution was used to measure the difference between two food images that are affected by the wavelength of the light source. A histogram was obtained from the image in which the non-food portion was masked. In the present study, the Bhattacharyya distance was employed to measure the differences between two food images. The Bhattacharyya distance for two images I n and I m acquired under a light source of a wavelength λ k is given by
D B ( I n , I m | λ k ) = log { y Y p ( y | I n , λ k ) p ( y | I m , λ k ) }
where Y is the set of possible brightness values, and p ( y | I , λ ) is the probability density function of the brightness value y included in image I at a wavelength of λ . The Bhattacharyya distance represents a complete match at a value of 0 (minimum) and a complete mismatch at a value of 1 (maximum). Figure 4 presents an example of the two different food images acquired under the different wavelength light sources. Two food items, coke and sugar-free grape juice looked similar under a visible light source, as shown in the first row of Figure 4, whereas the differences between these two food images were apparent in the IR images at 810 nm. For this image pair, the Bhattacharyya distances for λ = 640 nm and λ = 810 nm were given as 0.45 and 0.99, respectively. This example shows that the Bhattacharyya distance can be a good indicator of visual differences between two different food items.
The usefulness of UV/NIR images in terms of food classification was verified by examining the proportion of food pairs that were visually similar under visible light but visually distinct under UV or NIR light. To this end, a cumulative distribution function was defined; it can be heuristically computed as follows:
F C ( T V , T N V ) | { ( I m , I n ) | D B ( V ) ( I n , I m ) < T V and D B ( N V ) ( I n , I m ) > T N V } | | { ( m , n ) | D B ( V ) ( I n , I m ) < T V } |
where ( I m , I n ) is the pair of two different food items m and n, and | S | is the cardinal of the set S. In this study, the maximum D B value in each band (visible and non-visible bands) was chosen as the representative D B for the corresponding band, and, hence, D B ( V ) and D B ( N V ) are given by
D B ( V ) ( I n , I m ) = max λ Λ V D B ( I n , I m | λ ) D B ( N V ) ( I n , I m ) = max λ Λ N V D B ( I n , I m | λ )
where Λ V and Λ N V are the sets of visible and non-visible wavelengths, respectively. T V and T N V are the thresholds of the Bhattacharyya distances for the visible band and non-visible band, respectively, and F C ( T V , T N V ) represents the ratio of image pairs that satisfy the condition whereby the Bhattacharyya distance from the VIS images is less than T V while the Bhattacharyya distance from UV or IR images is greater than T N V . The case of T V < T N V , F ( T V , T N V ) represents the frequency of a relatively small Bhattacharyya distance (a low degree of discrimination) in the visible light band but a high Bhattacharyya distance (a high degree of discrimination) in the non-visible light band.
The cumulative distributions for the various thresholds are plotted in Figure 5. Note that the visual difference between two images was not significant at a Bhattacharyya distance of 0.45, as shown in Figure 6. And, hence, the curves with T V 0.4 correspond to the cumulative distribution obtained from food pairs that are not well distinguished under a visible light source. In the case of T V = 0.4 , the ratio of the Bhattacharyya distance for UV/NIR images exceeding 0.5 (corresponds to well-distinguished) was about 70%. Similar results were obtained for other T V values. (e.g., For T V = 0.2, 0.3, and 0.4, F C ( T V , T N V = T V ) = 89, 77, and 64%, respectively) This means that a significant number of food pairs that were not visually well-distinguished under visible light sources was better distinguished under non-visible light sources. Such results indicate that the performance of food classification can be improved by using UV/NIR images that are complementary to VIS images.
In terms of caloric estimation, the usefulness of a specific wavelength image was determined by examining whether differences in the amount of calories between two food items were significantly correlated with differences between the two corresponding images. The underlying assumption is that if the difference between the two food images is large, their caloric difference will also be large and vice-versa. The caloric count was computed via the measured weight and the nutrition facts for each food. The Bhattacharyya distances were also adopted to measure the differences between the two food items. The caloric difference between the two food items n and m is given by the following absolute relative difference:
D c ( n , m ) = | c n c m | c n + c m
where c n and c m are the calories of food items n and m, respectively. The Pearson correlation for the k-th wavelength images is given by
ρ ( λ k ) = cov [ D B ( I | λ k ) , D C ] σ D B ( I | λ k ) σ D C
where cov [ x , y ] denotes the covariance of x and y and σ x is the standard deviation of x. D B ( I | λ k ) denotes the Bhattacharyya distance computed from the images acquired under a light source of λ k wavelength.
The correlations are presented across the wavelengths of each light source in Figure 7. The maximum correlation was obtained at λ = 870 nm . A second maximum also appeared in the wavelength of the NIR band ( λ = 850 nm ). The results indicate that the differences between the NIR images are moderately correlated with differences in caloric content. The images acquired under the NIR light source are beneficial in terms of caloric estimation. The average correlations of the visible and non-visible bands were 0.636 and 0.633, respectively. The significance test also showed that there was no remarkable difference between the correlation values of the visible band and those of the non-visible band (p = 0.7). From such results, it can be reasonably assumed that VIS- and UV/NIR-images are equally useful in terms of caloric estimation.

3.2. Preprocessing

Although a highly stable current source was adopted to drive the LEDs, there was some variation in the intensity of the light from shot to shot. This caused unwanted changes in the acquired images and resulted in degradation of the estimation accuracy. A simple way to compensate for variations in the intensity of light sources is to adjust the intensity of the incident light so that the average intensity approximates that of the reference intensity. A typical scale factor is given by
α ω ( i ) = μ ω ( i ) μ ω ( r e f )
where ω and i are the indices of the wavelength and shot, respectively, and μ ω ( i ) and μ ω ( r e f ) represent the average and reference intensities, respectively. The reference intensity can be obtained by averaging a large number of light source images at different times. Such an intensity normalization method is very simple and easy to implement.

3.3. Food Analysis Using a Convolutional Neural Network

There are many machine learning schemes, such as random forest (RF), support vector regression (SVR), partial least squares regression (PLSR) and artificial neural networks (ANNs), that can be applied to recognition of the types, amounts, and nutritional content of food. Among them, the ANN-based approaches have an advantage wherein nonlinear relationships between the input (multispectral images) and the output (target values) can be taken into consideration in constructing the estimation rules. This leads to higher performance in terms of classification and regression. Accordingly, a supervised learning approach that employs convolutional neural networks (CNNs) was adopted in this study. The architecture of the CNN adopted in this study is shown in Figure 8, and was heuristically determined using a validation dataset (10% of the entire learning dataset). Note that the CNN architecture shown in Figure 8 was used to classify the types of food. The final output was replaced by a single output in the case of caloric estimation. The resultant architecture of the CNN was simpler than others previously proposed in image recognition tasks (e.g., ResNet-50, Inception-v3, Xception). This was due mainly to the smaller number of targets compared with that of other CNNs (101 vs. 1000).
We tested the performance in terms of food classification accuracy and estimation error for target values according to different sizes of CNN input (input image sizes). The results showed that sizes of 64 × 64 yielded the highest performance for both classification and estimation accuracies. Accordingly, all images from the camera were reduced to 64 × 64 by using interpolation. Before reducing the size, no cropping was carried out on the acquired images, and the entire image size (640 × 480) was used.
The performance according to the hyperparameters of CNN was also investigated using a validation dataset. This was performed separately for each task (food classification and caloric estimation). The resultant CNN was composed of two convolution/max pooling layers and a fully connected multi-layer perceptron (MLP) with a recognition output, as shown in Figure 8. The kernel sizes of the first and the second convolution layers were 11 × 11 and 7 × 7, respectively, while the window sizes of the max pooling layers were commonly 4 × 4. There were three fully connected layers in the employed CNN, which corresponded to input from the final convolution layer, and to the hidden and output layers. The numbers of nodes for each of the layers were also determined using the validation datasets 112, 128, and 18, respectively. Although the hyperparameters of each CNN were separately tuned for each task, the architecture of the CNN, shown in Figure 8, yielded satisfactory performance for both image classification and caloric estimation.
A rectified linear unit (ReLU) was adopted as an activation function for all hidden layers. A soft-max function and linear combination function were employed for the output layer for classification and regression CNNs, respectively. Accordingly, the loss functions were given by the cross-entropy and the mean absolute percentage error (MAPE) in the cases of food classification and caloric estimation, respectively. A total of 1000 epochs resulted in a trained CNN with sufficient performance in terms of food recognition accuracy. It is noteworthy that the accuracy of classification/estimation was strongly affected by the mini-batch size. The experimental results showed that a mini-batch size of 32 gave the best performance for all cases. Since the MAPE is given by dividing the absolute error value by its ground truth, the loss value cannot be calculated when the given target value is zero. As shown in Table 1, there were some cases when the ground-truth caloric count was zero. Note that a value of zero in the nutrition facts does not necessarily mean that the amount of the nutritional content is zero. A value of zero actually means that the amount is less than its predefined minimum. In the present study, zero caloric values were replaced by the minimum, which was 5 (kcal) according to [40].

3.4. Selection of the Wavelengths

Although evaluation of all possible wavelength combinations was performed via off-line processing, it was desirable to avoid an enormous amount of computational time for a brute-force grid search. In the present study, a piecewise selection method was adopted to select the set of optimal wavelengths. Let Ω = { ω 0 , ω 1 , , ω N 1 } be the set of the employed wavelengths, where N is the total number of the wavelengths. The set of the wavelengths was gradually constructed by adding and removing the wavelength either to or from the previously constructed set. The overall procedure is as follows:
Step-1. Forward selection: Let Ω F ( i ) be the set of the wavelength at the i-th forward step, all combinations Ω ( i 1 ) { ω | ω Ω ¯ ( i 1 ) = Ω Ω ( i 1 ) } are evaluated to find the optimal wavelength ω F * that minimizes the given loss function, then construct Ω F ( i ) = Ω ( i 1 ) { ω F * } .
Step-2. Backward elimination: The element (wavelength) that minimizes the loss function is removed from Ω F ( i ) . The set of the wavelength at the i-th backward step is then given by
Ω B ( i ) = Ω F ( i ) { ω B * }
where
ω B * = arg min ω Ω F ( i ) L ( Ω F ( i ) { ω } )
and L ( S ) is the loss for a wavelength set S that is given by the final loss of the learned CNN.
Step-3. Final forward selection: The final set is built by the forward selection step wherein the optimal wavelength ω * is chosen from the set Ω Ω B ( i ) so as to minimize the loss function.
The procedure Steps-1∼3 was iteratively performed until Ω ( i ) = Ω . The final set of the optimal wavelengths is given by
Ω * = min i L ( Ω ( i ) )
A large number of the wavelengths in Ω * involved an increased number of LEDs and shots, which resulted in a large device, a high rate of power consumption, and long acquisition times. Hence, it is preferred that the number of wavelengths should also be considered in building the estimation rules. The three-channel (RGB) images could be obtained from one image acquired under the white light that could be regarded as a representative image within the visible light region. In the present study, food analysis was performed in combination with RGB, UV, and NIR images to reduce the number of shots and the results were compared by combining all the images acquired in the full set of wavelengths.

4. Experimental Results

4.1. Accuracy for Food Item Classification

The food classification accuracy according to the number of input images is presented in Figure 9. The number of input images is equal to the number of images actually taken by the camera at a different wavelength. Note that although the RGB image was separated into three individual images that were inputted to CNN, they were considered as one image because they were captured by one shot. The results indicated that the accuracy was increased rapidly when the number of images was less than five. Even when one image was used, the classification accuracy was higher than the previous CNN-based food recognition method [18]. Such a high level of accuracy was due mainly to the usage of a large-sized training dataset that included images acquired from various directions. A maximum accuracy of 99.81% was obtained when images acquired from 11 different wavelength light sources were used. However, the accuracy increased until the number of images was five, at which point no significant increase was observed when the number of images exceeded five. The maximum increase in classification accuracy (7.67%) was obtained when the number of input images was increased from one to two. This was confirmed by the fact that the correlation coefficient between the number of images and classification accuracies was 0.919 for as many as five images. The correlation coefficient was decreased to 0.587 when the number of images ranged from 6 to 19. Such results indicate that it is possible to obtain a sufficient recognition rate by using five images at different wavelengths. In fact, the recognition rate obtained from the five images was 99.12%, which was not significantly lower than the maximum recognition rate of 99.81%.
Thus far, the results were obtained by excluding RGB images. The food classification accuracy is presented in Figure 9, where single-wavelength images were added to the RGB image. The increase in recognition rate was more rapid compared with when the RGB image was not used. A maximum recognition rate of 99.83% was obtained when almost all images (19 out of 20) were used. The largest increase in classification accuracy was achieved when only one single wavelength image was added to the RGB image (e.g., accuracy was increased from 88.86 to 97.08% when an image from a wavelength of 890 nm was added to the RGB image.) The accuracy gradually increased until the number of images added exceeded three, and remained almost constant until the number exceeded five. When food classification was performed with only one type of image (as with the previous image-based food classification methods), the correct recognition rates of 88.86% and 87.9% were achieved by using the RGB image and one other image from a single wavelength, respectively. This indicates that an RGB image is a slightly better choice for food classification when only one type of image is used.
As noted in the previous section, a high level of accuracy is paramount when using a small number of images. From this point of view, it is meaningful to examine how the correct recognition rate changes according to the food item when recognition is performed by adding only one type of NIR or UV image to an RGB image. The experimental results on two input images (including the RGB image) showed that the highest classification accuracy was achieved when an image from a wavelength of 890 nm was added to the RGB image. Accordingly, the change in the recognition rate for each food item was examined when adding only one NIR image from a wavelength of 890nm to the RGB image. From among 101 food items, 62 items showed an improved recognition rate by adding only one type of NIR image, and 35 items showed the same recognition rate. A decrease in the recognition rate was observed for only four food items, but the level of decrease was generally small (<5%). The list of food items that resulted in a significant level of improvement in the recognition rate by adding one type of NIR image is presented in Table 2. It is noteworthy that, for most of these food items, there exist other food items that are not easily distinguished under visible light. For example, the food pairs coffee and coffee with sugar, as well as caffelatte and caffelatte with sugar, were almost visually identical. These food items were often recognized as other foods that appeared almost identical under visible light. For the food item of grape soda, a recognition rate of 0% was obtained when only an RGB image captured under visible light was used. In this case, all grape soda images were recognized as sugar-free grape soda that is visually identical to grape soda. When adding an image acquired under a 890 nm wavelength light source, a recognition rate of 44% was obtained. Consequently, it is apparent that the NIR/UV images improve the accuracy of image-based food classification.

4.2. Accuracy for Caloric Estimation

The results for caloric estimation are presented in Figure 10 in which MAPEs are plotted for each number of images actually taken. Without an RGB image, the minimum of MAPE was 10.49 when a total of 10 different wavelength images were employed. Interestingly, MAPE was decreased until the number of input images reached 10 ( R 2 = 0.9317 ), but was increased after the number of input images exceeded 11 ( R 2 = 0.6530 ). That result was likely due to the limitations of the piecewise algorithm adopted in the selection of the wavelengths and overfitting problems caused by an excessive increase in the number of input images. When the three wavelength images were employed, the MAPE was 18.97%, which was significantly lower than when using the RGB image alone (=41.97%). Such a result was also remarkably better than those of the previous CNN-based caloric estimation schemes (27.93% [19], even though the number of the food items adopted was relatively large (101 vs. 15). The selected wavelengths were 385, 560, and 970 nm in the case of three input images, indicating that one each of UV, VIS, and NIR wavelength band images was selected. This implies that not only the number of input images, but also the selected wavelengths play an important role in the accuracy of caloric estimation.
When single-wavelength images were added to an RGB image, a minimum of 10.56% MAPE was achieved. A total of 14 input images were needed to obtain the minimum MAPE, which meant more images were required compared with when the RGB image was excluded (10 images). Overall, the MAPEs take the form of decreasing curves with an increasing number of input images ( R 2 = 0.5935 ). However, the correlation values before and after the minimum point (14) were −0.6847 and −0.4919, respectively, indicating that there was no significant difference between the number of input images and MAPE when the number of input images exceeded 14. The Pearson correlation for the caloric counts between the ground truth and the estimation was also notably larger than that of the previous CNN-based estimation schemes [18,19] (0.975 vs. 0.806) when a total of 14 UV/VIS/NIR images was employed in the caloric estimation. This was due mainly to the usage of the UV/NIR images in this study. Even for visually similar foods with different caloric counts, their UV/NIR images were often clearly distinguishable. Adding more images obtained from light sources with various wavelengths in the UV and NIR bands progressively improved the process of classifying images according to the caloric count.
Figure 11 compares the cumulative distribution function of MAPE when using only RGB images to also using other images acquired from a single wavelength light source. The superiority of using additional images along with RGB images is confirmed by this figure. For example, about 50% of all test food images revealed a MAPE value of less than 30 in the case of using only RGB images. When adding one image to the RGB image, 50% of all test food images showed a MAPE value of less than 12. This was further reduced to five when five images were added to the RGB image.
It was also important to examine the accuracy of the caloric estimation for each food item when adding only one type of NIR/UV image to the RGB image. An image acquired at a wavelength of 970 nm represented the highest MAPE reduction rate and was chosen as an addition to the RGB image. Compared with the use of only RGB images, the number of food items with decreased, increased, and maintained MAPE values was 77, 20, and 1, respectively (out of 98 food items with valid caloric values). This indicates that the accuracy of caloric estimation for many food items (>78%) can be improved by including only one type of NIR image. This is also confirmed by the fact that most of the MAPE reduction (74.8%) was achieved when only one NIR image (at 970 nm) was added to the RGB image, as shown in Figure 11. Table 3 lists the large differences in MAPE when only the RGB image was used compared with using the RGB + 970 nm images. The estimated caloric values of the food items presented in Table 3 deviated from the ground truth by more than 50%. The results show that the MAPEs values for these food items were reduced by more than half when calories were estimated after adding one type of NIR image to the RGB image.
In conclusion, the use of UV/NIR images in additionj to RGB images increases the accuracy of caloric estimation. Even with a smaller number of UV/NIR images, the performance in terms of caloric estimation was significantly improved compared with conventional RGB-based estimation.

4.3. Analysis of the Selected Wavelengths

The wavelengths selected for food classification appear in Table 4 for each number of images. The selection rate for each wavelength is shown in Figure 12. The results are presented for two cases: inclusion and exclusion of RGB images. When the RGB image was excluded, the ratio of wavelengths corresponding to the NIR band being selected was 46.84%. On the other hand, 41.05% of wavelengths corresponded to the visible light band. The UV band was selected at a relatively low rate (12.11%). This was partially because the number of employed UV light sources was smaller than that of the UV/VIS light sources. Hence, the light source in the UV band was not likely to be selected. The chances of selecting the wavelength of the NIR band was increased to 60% in cases when a relatively small number of wavelengths was allowed (≤5). Considering the fact that the number of wavelengths belonging to the NIR band was slightly less than the number of VIS wavelengths (8 vs. 9), a higher selection rate of wavelengths in the NIR band indicates that the NIR images are more useful in food classification compared with the VIS images. This seems to be due in part to the nature of the NIR images, where the distribution of water content in food can be approximately obtained.
When the RGB image was included, the ratio of wavelengths in the NIR band being selected was further increased to 50.53% This is because the RGB image actually includes three visible light images (RGB images), so the NIR band images are likely to be selected. Whether the RGB images were included or excluded, frequent wavelength selections were 385, 870, 890, and 970 nm. This also indicates that the wavelengths corresponding to the invisible band were more frequently selected for food classification.
The wavelength selection results for caloric estimation are shown in Figure 13 and in Table 5. When excluding the RGB images, the wavelengths in the NIR region were selected with a frequency of 31.58%, whereas wavelengths in the visible region were selected with a frequency of 51.05%. Even when RGB images were included, the wavelengths in the visible region were selected more frequently compared with those in the NIR band. This result was somewhat different from that for food classification. Wavelengths of 385, 430, 560, and 970 nm were frequently selected when excluding the RGB images, which indicates that the wavelengths corresponding to the visible band were also frequently selected. On the other hand, the wavelengths of 405, 430, 950, and 1020 nm were selected with relatively high frequency when the RGB images were included.

5. Conclusions

RGB images are mostly used in image-based food analysis. The proposed approach assumes that the types of food and the caloric count in them could partially be determined by the morphological characteristics and wavelength distributions of food images. A multi-spectral analysis technique using NIR and UV light along with VIS light was employed in the analysis of various foods and the results approximated those of conventional chemical analysis techniques. The present study was motivated by such a multi-spectral approach, and the procedure for food classification and caloric estimation adopted a multi-spectral analysis technique. Automated equipment capable of acquiring images of up to 20 wavelengths was devised, and approximately 15,000 images were acquired per wavelength from 101 types of food. The types of food and caloric content were estimated using a CNN. A CNN was used so that the relevant features for estimating the target variables could be automatically derived from the images, and a feature extraction step was unnecessary. The experimental results showed that the performance in terms of the accuracy for food item classification and caloric estimation was notably superior to previous methods. This was due mainly to the usage of various light sources in the UV/VIS/NIR band, unlike conventional methods which use only RGB images. It would be interesting to connect the multi-spectral imaging techniques with the quantification of ingredients such as proteins, fats, carbohydrates, sugars, and sodium. Future work will focus on this issue.

Funding

This work is funded by Korean Evaluation Institute of Industrial Technology (KEIT) under Industrial Embedded System Technology Development (R&D) Program 20016341.

Data Availability Statement

The data used to support the findings of this study can be made available by the corresponding author upon request.

Acknowledgments

The author is grateful to the members of the bio-signal processing laboratory at Konkuk University for participating in several experiments.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Moayyedi, P. The epidemiology of obesity and gastrointestinal and other diseases: An overview. Dig. Dis. Sci. 2008, 9, 2293–2299. [Google Scholar] [CrossRef]
  2. Prentice, A.M.; Black, A.E.; Murgatroyd, P.R.; Goldberg, G.R.; Coward, W.A. Metabolism or appetite: Questions of energy balance with particular reference to obesity. J. Hum. Nutr. Diet. 1989, 2, 95–104. [Google Scholar] [CrossRef]
  3. De Castro, J.M. Methodology, correlational analysis, and interpretation of diet diary records of the food and fluid intake of free-living humans. Appetite 1994, 2, 179–192. [Google Scholar] [CrossRef]
  4. Kaczkowski, C.H.; Jones, P.J.H.; Feng, J.; Bayley, H.S. Four-day multimedia diet records underestimate energy needs in middle-aged and elderly women as determined by doubly-labeled water. J. Nutr. 2000, 4, 802–805. [Google Scholar] [CrossRef] [PubMed]
  5. Nishimura, J.; Kuroda, T. Eating habits monitoring using wireless wearable in-ear microphone. In Proceedings of the International Symposium on Wireless Pervasive Communication, Santorini, Greece, 7–9 May 2008; pp. 130–133. [Google Scholar]
  6. Päfiler, S.; Wolff, M.; Fischer, W.-J. Food intake monitoring: An acoustical approach to automated food intake activity detection and classification of consumed food. Physiol. Meas. 2012, 33, 1073–1093. Available online: http://stacks.iop.org/0967.3334/33/1073 (accessed on 28 November 2022).
  7. Shuzo, M.; Komori, S.; Takashima, T.; Lopez, G.; Tatsuta, S.; Yanagimoto, S.; Warisawa, S.; Delaunay, J.-J.; Yamada, I. Wearable eating habit sensing system using internal body sound. J. Adv. Mech. Des. Syst. Manuf. 2010, 1, 158–166. [Google Scholar] [CrossRef]
  8. Alshurafa, N.; Kalantarian, H.; Pourhomayoun, M.; Liu, J.; Sarin, S.; Sarrafzadeh, M. Recognition of nutrition-intake using time-frequency decomposition in a wearable necklace using a piezoelectric sensor. IEEE Sens. J. 2015, 7, 3909–3916. [Google Scholar] [CrossRef]
  9. Bi, Y.; Lv, M.; Song, C.; Xu, W.; Guan, N.; Yi, W. Autodietary: A wearable acoustic sensor system for food intake recognition in daily life. IEEE Sens. J. 2016, 3, 806–816. [Google Scholar] [CrossRef]
  10. Weiss, R.; Stumbo, P.J.; Divakaran, A. Automatic food documentation and volume computation using digital imaging and electronic transmission. J. Am. Diet. Assoc. 2010, 1, 42–44. [Google Scholar] [CrossRef]
  11. Lester, J.; Tan, D.; Patel, S. Automatic classification of daily fluid intake. In Proceedings of the IEEE 4th International Conference on Pervasive Computing Technologies for Healthcare, Munich, Germany, 22–25 March 2010; pp. 1–8. [Google Scholar]
  12. Zhang, R.; Amft, O. Monitoring chewing and eating in free-living using smart eyeglasses. IEEE J. Biomed. Health Inform. 2018, 1, 23–32. [Google Scholar] [CrossRef]
  13. Thong, Y.J.; Nguyen, T.; Zhang, Q.; Karunanithi, M.; Yu, L. Prediction food nutrition facts using pocket-size near-infrared sensor. In Proceedings of the 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Seogwipo, Republic of Korea, 11–15 July 2017; pp. 11–15. [Google Scholar]
  14. Lee, K.-S. Joint Audio-ultrasonic food recognition using MMI-based decision fusion. IEEE J. Biomed. Health Inform. 2020, 5, 1477–1489. [Google Scholar] [CrossRef]
  15. Sun, M.; Liu, Q.; Schmidt, K.; Yang, J.; Yao, N.; Fernstrom, J.D.; Fernstrom, M.H.; DeLany, J.P.; Sclabassi, R.J. Determination of food portion size by image processing. In Proceedings of the 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Vancouver, BC, Canada, 21–24 August 2008; pp. 871–874. [Google Scholar]
  16. Zhu, F.; Bosch, M.; Woo, I.; Kim, S.Y.; Boushey, C.J.; Ebert, D.S.; Delp, E.J. The use of mobile devices in aiding dietary assessment and evaluation. IEEE J. Sel. Top. Signal Process. 2010, 4, 756–766. [Google Scholar] [PubMed]
  17. Pouladzadeh, P.; Shirmohammadi, S.; Al-Maghrabi, R. Measuring calorie and nutrition from food image. IEEE Trans. Instrum. Meas. 2014, 8, 1947–1956. [Google Scholar] [CrossRef]
  18. Ege, T.; Yanai, K. Simultaneous estimation of food categories and calories with multi-task CNN. In Proceedings of the 15th International Conference on Machine Vision Applications, Nagoya, Japan, 8–12 May 2017; pp. 198–201. [Google Scholar]
  19. Ege, T.; Ando, Y.; Tanno, R.; Shimoda, W.; Yanai, K. Image-based estimation of real food size for accurate food calorie estimation. In Proceedings of the IEEE conference on Multimeda Information Processing and Retrieval, San Jose, CA, USA, 28–30 May 2019; pp. 274–279. [Google Scholar]
  20. Lee, K.-S. Automatic estimation of food intake amount using visual and ultrasonic signals. Electronics 2021, 10, 2153. [Google Scholar] [CrossRef]
  21. Dehais, J.; Anthimopoulos, M.; Shevchik, S.; Mougiakakou, S. Two-view 3D reconstruction for food volume estimation. IEEE Trans. Multimed. 2017, 5, 1090–1099. [Google Scholar] [CrossRef]
  22. Lubura, J.; Pezo, L.; Sandu, M.A.; Voronova, V.; Donsì, F.; Šic Žlabur, J.; Ribić, B.; Peter, A.; Šurić, J.; Brandić, I.; et al. Food Recognition and Food Waste Estimation Using Convolutional Neural Network. Electronics 2022, 11, 3746. [Google Scholar] [CrossRef]
  23. Dai, Y.; Park, S.; Lee, K. Utilizing Mask R-CNN for Solid-Volume Food Instance Segmentation and Calorie Estimation. Appl. Sci. 2022, 12, 10938. [Google Scholar] [CrossRef]
  24. Raju, V.B.; Sazonov, E. Detection of oil-containing dressing on salad leaves using multispectral imaging. IEEE Access 2020, 8, 86196–86206. [Google Scholar] [CrossRef]
  25. Sugiyama, J. Visualization of sugar content in the flesh of a melon by near-infrared imaging. J. Agric. Food Chem. 1999, 47, 2715–2718. [Google Scholar] [CrossRef]
  26. Ropodi, A.I.; Pavlidis, D.E.; Mohareb, F.; Pangaou, E.Z.; Nychas, E. Multispectral image analysis approach to detect adulteration of beef and pork in raw meats. Food Res. Int. 2015, 67, 12–18. [Google Scholar] [CrossRef]
  27. Liu, J.; Cao, Y.; Wang, Q.; Pan, W.; Ma, F.; Liu, C.; Chen, W.; Yang, J.; Zheng, L. Rapid and non-destructive identification of water-injected beef samples using multispectral imaging analysis. Food Chem. 2016, 190, 938–943. [Google Scholar] [CrossRef]
  28. Tang, C.; He, H.; Li, E.; Li, H. Multispectral imaging for predicting sugar content of Fuji apples. Opt. Laser Technol. 2018, 106, 280–285. [Google Scholar]
  29. Nawi, N.M.; Chen, G.; Jensen, T.; Mehdizadeh, S.A. Prediction and classification of sugar content of sugarcane based on skin scanning using visible and shortwave near infrared. Biosyst. Eng. 2013, 115, 154–161. [Google Scholar] [CrossRef]
  30. Morishita, Y.; Omachi, T.; Asano, K.; Ohtera, Y.; Yamada, H. Study on non-destructive measurement of sugar content of peach fruit utilizing photonic crystal-type NIR spectroscopic camera. In Proceedings of the International Workshop on Emerging ICT, Sendai, Japan, 31 October–2 November 2016. [Google Scholar]
  31. Fu, X.; Wang, X.; Rao, X. An LED-based spectrally tuneable light source for visible and near-infrared spectroscopy analysis: A case study for sugar content estimation of citrus. Biosyst. Eng. 2017, 163, 87–93. [Google Scholar] [CrossRef]
  32. Gomes, V.M.; Fernandes, A.M.; Faia, A.; Melo-Pinto, P. Comparison of different approaches for the prediction of sugar content in new vintage of whole Port wine grape berries using hyperspectral imaging. Comput. Electron. Agric. 2017, 140, 244–254. [Google Scholar]
  33. Khodabakhshian, R.; Emadi, B.; Khojastehpour, M.; Golzarian, M.R.; Sazgarnia, A. Development of a multispectral imaging system for online quality assessment of pomegranate fruit. Int. J. Food Prop. 2017, 20, 107–118. [Google Scholar]
  34. Rady, A.M.; Guyer, D.E.; Watson, N.J. Near-infrared spectroscopy and hyperspectral imaging for sugar content evaluation in potatoes over multiple growing seasons. Food Anal. Methods 2021, 14, 581–595. [Google Scholar]
  35. Wickramasinghe, W.A.N.D.; Ekanayake, E.M.S.L.N.; Wijedasa, M.A.C.S.; Wijesinghe, A.D.; Madhujith, T.; Ekanayake, M.P.B.; Godaliyadda, G.M.R.I.; Herath, H.M.V.R. Validation of multispectral imaging for the detection of sugar adulteration in black tea. In Proceedings of the 10th International Conference on Information and Automation for Sustainability, Padukka, Sri Lanka, 11–13 August 2021. [Google Scholar]
  36. Wu, D.; He, Y. Study on for soluble solids contents measurement of grape juice beverage based on Vis/NIRS and chemomtrics. Proc. SPIE 2007, 6788, 639–647. [Google Scholar]
  37. Zhang, C.; Liu, F.; Kong, W.; He, Y. Application of visible and near-infrared hyperspectral imaging to determine soluble protein content in oilseed rape leaves. Sensors 2015, 15, 16576–16588. [Google Scholar] [CrossRef]
  38. Ahn, D.; Choi, J.-Y.; Kim, H.-C.; Cho, J.-S.; Moon, K.-D. Estimating the composition of food nutrients from hyperspectral signals based on deep neural networks. Sensors 2019, 19, 1560. [Google Scholar] [CrossRef]
  39. Chungcharoen, T.; Donis-Gonzalez, I.; Phetpan, K.; Udompetnikul, V.; Sirisomboon, P.; Suwalak, R. Machine learning-based prediction of nutritional status in oil palm leaves using proximal multispectral images. Comput. Electron. Agric. 2022, 198, 107019. [Google Scholar] [CrossRef]
  40. Food Nutrient Database. The Ministry of Korea Food and Drug Safety (KFDA). Available online: https://various.foodsafetykorea.go.kr/nutrient/nuiIntro/nui/intro.do (accessed on 28 November 2022).
Figure 1. Schematic of the image acquisition system.
Figure 1. Schematic of the image acquisition system.
Foods 12 03212 g001
Figure 2. Photograph of the image acquisition system.
Figure 2. Photograph of the image acquisition system.
Foods 12 03212 g002
Figure 3. Photograph of LED in the panel.
Figure 3. Photograph of LED in the panel.
Foods 12 03212 g003
Figure 4. Examples of images ((Top): bread, (Middle): castella, (Bottom): chocolate bar) acquired from various directions.
Figure 4. Examples of images ((Top): bread, (Middle): castella, (Bottom): chocolate bar) acquired from various directions.
Foods 12 03212 g004
Figure 5. The cumulative distributions of the distances obtained from non-visible images and those from visible images.
Figure 5. The cumulative distributions of the distances obtained from non-visible images and those from visible images.
Foods 12 03212 g005
Figure 6. (a) Images acquired under a 640 nm light source. (b) Images acquired under a 810 nm light source. The food items are coke (top) and sugar-free grape juice (bottom).
Figure 6. (a) Images acquired under a 640 nm light source. (b) Images acquired under a 810 nm light source. The food items are coke (top) and sugar-free grape juice (bottom).
Foods 12 03212 g006
Figure 7. Correlations for each of the wavelength images between the caloric differences and the Bhattacharyya distances.
Figure 7. Correlations for each of the wavelength images between the caloric differences and the Bhattacharyya distances.
Foods 12 03212 g007
Figure 8. Architecture of the proposed CNN for the classification of food type, where N w is the number of input images and N t is the number of targets. (101 for food classification and 1 for caloric estimation).
Figure 8. Architecture of the proposed CNN for the classification of food type, where N w is the number of input images and N t is the number of targets. (101 for food classification and 1 for caloric estimation).
Foods 12 03212 g008
Figure 9. The classification accuracies of food items according to the number of light sources when the RGB image is included or excluded.
Figure 9. The classification accuracies of food items according to the number of light sources when the RGB image is included or excluded.
Foods 12 03212 g009
Figure 10. The mean absolute percentage errors (MAPEs) of caloric content (kcal) according to the number of wavelengths, when the RGB image was either included or excluded.
Figure 10. The mean absolute percentage errors (MAPEs) of caloric content (kcal) according to the number of wavelengths, when the RGB image was either included or excluded.
Foods 12 03212 g010
Figure 11. The cumulative distribution of MAPE according to the input of CNN (RGB image only, RGB + 1-images, and RGB + 5-images).
Figure 11. The cumulative distribution of MAPE according to the input of CNN (RGB image only, RGB + 1-images, and RGB + 5-images).
Foods 12 03212 g011
Figure 12. The selection ratio of each wavelength for food classification when the RGB image is included or excluded.
Figure 12. The selection ratio of each wavelength for food classification when the RGB image is included or excluded.
Foods 12 03212 g012
Figure 13. The selection ratio of each wavelength for calorie estimation when the RGB image is included or excluded.
Figure 13. The selection ratio of each wavelength for calorie estimation when the RGB image is included or excluded.
Foods 12 03212 g013
Table 1. Dataset properties per food item.
Table 1. Dataset properties per food item.
Food ItemWeightCalorieFood ItemWeightCalorie
UnitsgkcalUnitsgkcal
apple juice180.5N/apork (steamed)119.3441.41
almond milk175.541.57potato chips23.5130.82
banana143.6127.80potato chips (onion flavor)23.5133.95
banana milk174.6110.27sports drink (blue)17017.00
chocolate bar (high protein)35167.00chocolate bar (with fruits)40170.00
beef steak68.1319.39milk pudding140189.41
beef steak with source79330.29ramen (Korean-style noodles)308280.00
black noodles127.4170.00rice (steamed)172.3258.45
black noodles with oil132.4N/arice cake119.3262.46
blacktea16852.68rice cake and honey127.9288.60
bread47.8129.54rice juice173.8106.21
bread and butter54.8182.04rice (steamed, low-calorie)164.6171.18
castella89.9287.68multi-grain rice175.3258.08
cherryade16879.06rice noodles278140.00
chicken breast100.6109.00cracker41.5217.88
chicken noodles70255.00salad1 (lettuce and cucumber)96.824.20
black chocolate40.37222.04salad1 with olive oil10637.69
milk chocolate41228.43salad2 (cabbage and carrot)69.117.28
chocolate milk180.1122.62salad2 with fruit-dressing7928.04
cider16670.55armond cereal (served with milk)191.7217.36
clam chowder16090.00corn cereal (served with milk)192205.19
coffee16718.56soybean milk171.985.95
coffee with sugar (10%)16755.74spagetti250373.73
coffee with sugar (20%)16792.92kiwi soda (sugar-free)1662.34
coffee with sugar (30%)167130.11tofu138.662.37
coke16676.36cherry tomato20036.00
corn milk166.697.18tomato juice176.859.80
corn soup16085.00cherry tomato and syrup21061.90
cup noodle262.5120.00fruit soda16927.04
rice with tuna and pepper305418.15vinegar16820.16
dietcoke1660.00pure water1660.00
choclate bar50249.00watermelon juice177.779.97
roasted duck117.2360.98grape soda170.992.43
orange soda173.633.33grape soda (sugar-free)170.90.00
orange soda (sugar-free)166.22.77fried potato110.5331.50
fried potato and powder120364.92yogurt179114.56
sports drink177.147.23yogurt and sugar144.6106.04
ginger tea178.396.79milk soda16786.84
honey tea183.9126.69salt crackers41.3218.89
caffelatte171.679.13onion soap16083.00
caffelatte with sugar (10%)171.6115.66orange juice182.682.17
caffelatte with sugar (20%)171.6152.19peach (cutted)14255.38
caffelatte with sugar (30%)171.6188.72pear juice181.590.02
mango candy36.491.00peach and syrup192124.80
mango jelly58.6212.43peanuts37.1217.96
milk17194.50peanuts and salt37.3218.21
sweet milk171N/amilk tea16763.46
green soda174.584.55pizza (beef)85.5212.08
pizza (seafood)60148.83pizza (potato)72.3179.34
pizza (combination)70.9175.87plain yogurt143.7109.89
sports drink (white)175.843.95
mean141.17139.27
standard deviation60.69101.36
Table 2. Comparisons of recognition rates (%) for food items that revealed a large difference between the use of only an RGB image and using an RGB image with one additional (NIR) image.
Table 2. Comparisons of recognition rates (%) for food items that revealed a large difference between the use of only an RGB image and using an RGB image with one additional (NIR) image.
Food ItemsRGB-ImageRGB-Image
+Image at 890 nm
black noodles with oil58.33100
coffee with sugar (10%)16.67100
orange soda47.22100
caffelatte with sugar (30%)25.0097.22
peach (cut)55.5697.22
milk pudding44.44100
rice cake44.4486.11
rice cake and honey52.78100
salad1 with olive oil47.2294.44
salad2 (cabbage and carrot)47.2294.44
grape soda0.0044.44
Table 3. Comparison of MAPEs in caloric content (%) for food items showing large differences between use of the RGB-image only and use of the RGB image and one (NIR) image.
Table 3. Comparison of MAPEs in caloric content (%) for food items showing large differences between use of the RGB-image only and use of the RGB image and one (NIR) image.
Food ItemsRGB-ImageRGB-Image
+Image at 970 nm
salad1 (lettuce and cucumber)94.4620.85
salad2 (cabbage and carrot)85.235.37
rice with tuna and pepper51.686.27
orange soda (sugar-free)63.0618.19
salad2 with fruit dressing62.1115.41
cherry tomato58.8721.38
fried potato and powder55.0625.41
green soda52.9917.53
salad1 with olive oil62.6528.58
rice cake and honey62.129.52
rice cake61.9430.09
Table 4. Selected wavelengths for food classification. Top: without RGB image. Bottom: with RGB image.
Table 4. Selected wavelengths for food classification. Top: without RGB image. Bottom: with RGB image.
No. of imagesSelected wavelengths (nm)
1870
2660950
3660950970
4590660950970
5490660890950970
6405490660890950970
7385405490560660890970
83854905606608708909701020
93854054905608508708909701020
103854905606458108508708909701020
11385490510560625645810850870890970
12385430490510560625810850870890910970
13385405430490510560625850870890910950970
14385405430470490510560625850870890910950970
15385405430470490510560625645850870890910950970
163854054304704905106256456608508708909109509701020
173854054304704905106256456608108508708909109509701020
183854054304704905105906256456608108508708909109509701020
193854054304704905105605906256456608108508708909109509701020
No. of imagesSelected wavelengths (nm)
1RGB
2RGB890
3RGB870970
4RGB385870970
5RGB385870890970
6RGB385625870890970
7RGB385625810870890970
8RGB385625810870890910970
9RGB385430625870890910950970
10RGB385430490625870890910950970
11RGB385430490625645870890910950970
12RGB3854054306256458708909109509701020
13RGB3854054305106256458708909109509701020
14RGB3854054305105606256458708909109509701020
15RGB3854054304905105606256458708909109509701020
16RGB3854054304704905105606256458708909109509701020
17RGB3854054304704905105606256458108708909109509701020
18RGB3854054304704905105606256458108508708909109509701020
19RGB3854054304704905105605906256608108508708909109509701020
20RGB3854054304704905105605906256456608108508708909109509701020
Table 5. Selected wavelengths for caloric estimation. Top: without RGB image. Bottom: with RGB image.
Table 5. Selected wavelengths for caloric estimation. Top: without RGB image. Bottom: with RGB image.
No. of imagesSelected wavelengths (nm)
1430
2385970
3385560970
4385430560970
5385405430560970
6385405430560660970
7385405430510560660970
8385405430510560625660970
9385405430510560625660910970
10385405430470510560660850910970
11385405430470510560645660850910970
12385405430470510560625645660850910970
13385405430470510560590625660850890910970
14385405430470490510560625660810850890910970
15385405470490510560625645660810850890910950970
16385405430470490510560625645660810850890910950970
17385405430470490510560590625645660810850890910950970
18385405430470490510560590625645660810850870890910950970
193854054304704905105605906256456608108508708909109509701020
No. of imagesSelected wavelengths (nm)
1RGB
2RGB970
3RGB4051020
4RGB4055101020
5RGB4055109501020
6RGB4054305109501020
7RGB4054305106609501020
8RGB4054304705106609501020
9RGB4304705105906609509701020
10RGB3854304705906608109509701020
11RGB3854054304705906608709509701020
12RGB3854054304705605906608709509701020
13RGB3854054304705605906608108709509701020
14RGB3854054304705605906258108508709509701020
15RGB3854054304704905605906258108508709509701020
16RGB3854054304704905606256608108508709109509701020
17RGB3854054304704905605906256608108508709109509701020
18RGB3854054304704905105605906258108508708909109509701020
19RGB3854054304704905105605906256608108508708909109509701020
20RGB3854054304704905105605906256456608108508708909109509701020
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lee, K.-S. Multispectral Food Classification and Caloric Estimation Using Convolutional Neural Networks. Foods 2023, 12, 3212. https://doi.org/10.3390/foods12173212

AMA Style

Lee K-S. Multispectral Food Classification and Caloric Estimation Using Convolutional Neural Networks. Foods. 2023; 12(17):3212. https://doi.org/10.3390/foods12173212

Chicago/Turabian Style

Lee, Ki-Seung. 2023. "Multispectral Food Classification and Caloric Estimation Using Convolutional Neural Networks" Foods 12, no. 17: 3212. https://doi.org/10.3390/foods12173212

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop