Next Article in Journal
Smart Rural Village’s Healthcare and Energy Indicators—Twin Enablers to Smart Rural Life
Next Article in Special Issue
Organizational Ambidexterity as an Outcome of Quality Dimensions and Triple Helix: The Role of Technology Readiness and User Satisfaction
Previous Article in Journal
Characterization and Assessment of Organic Pollution at a Fumaric Acid Chemical Brownfield Site in Northwestern China
Previous Article in Special Issue
Investigating Residents’ Acceptance of Mobile Apps for Household Recycling: A Case Study of New Jersey
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Comparative Study of Engraved-Digit Data Augmentation by Generative Adversarial Networks

School of Electronic and Electrical Engineering, Kyungpook National University, Daegu 41566, Korea
*
Author to whom correspondence should be addressed.
Sustainability 2022, 14(19), 12479; https://doi.org/10.3390/su141912479
Submission received: 31 August 2022 / Revised: 22 September 2022 / Accepted: 27 September 2022 / Published: 30 September 2022
(This article belongs to the Special Issue Frontiers in Sustainable Information and Communications Technology)

Abstract

:
In cases where an efficient information retrieval (IR) system retrieves information from images with engraved digits, as found on medicines, creams, ointments, and gels in squeeze tubes, the system needs to be trained on a large dataset. One of the system applications is to automatically retrieve the expiry date to ascertain the efficacy of the medicine. For expiry dates expressed in engraved digits, it is difficult to collect the digit images. In our study, we evaluated the augmentation performance for a limited, engraved-digit dataset using various generative adversarial networks (GANs). Our study contributes to the choice of an effective GAN for engraved-digit image data augmentation. We conclude that Wasserstein GAN with a gradient norm penalty (WGAN-GP) is a suitable data augmentation technique to address the challenge of producing a large, realistic, but synthetic dataset. Our results show that the stability of WGAN-GP aids in the production of high-quality data with an average Fréchet inception distance (FID) value of 1.5298 across images of 10 digits (0–9) that are nearly indistinguishable from our original dataset.

1. Introduction

As machine learning and big data engineering have grown in popularity, information retrieval (IR) from various sources has become a subject of discussion. In general, an efficient IR system based on machine learning techniques requires a large collection of data sources because it learns by training with a large amount of data. To date, several issues have been raised in the early stages of applying recommender systems in the healthcare industry to assist health practitioners and users in making efficient and accurate health-related decisions [1]. We are particularly concerned about the implications of insufficient datasets used to train some of these models. One of the system applications involves the retrieval of expiry dates found on daily-life products administered to the human body. It is important to help people notice the expiry date, as it can help convey additional information about the product and regulate their health. We introduced generative adversarial network (GAN) as a way to use IT technology to support the continuous healthy life of people. The disabled, the elderly, and patients with vision loss, in particular, have a difficulty in checking dates because the digits are small and sometimes blurry. Expiry dates are often expressed as engraved digits, as shown in Figure 1. To automatically retrieve the engraved digits in expiry dates, we need a large dataset of engraved digits to train the IR system. We collected image (photo) data from medicines, consumables, cosmetic products, and tube-type ointments to recognize the expiration dates expressed in engraved digits. However, the performance of the classification results was not good because of the small amount of data. Our dataset had images that showed both the properties of digits and unusual shadows created by the engraved shape in the images, and it was differentiated from the MNIST dataset [2] by these properties.
The popularity of generative adversarial networks (GANs) is increasing nowadays, and these are used to generate synthetic datasets that are close and almost indistinguishable from the real data [3]. GAN is a type of machine learning technique that involves two models being trained simultaneously: the generator, G, and the discriminator, D. G is trained to generate fake data samples, and D is trained to distinguish between fake and real data [3,4]. These two models are trained together in an adversarial zero-sum game until the discriminator model is tricked half of the time, implying that the generator model generates satisfactory outputs [3,4]. This means that D effectively distinguishes between actual and false samples. It is either rewarded or no changes to the model parameters are required, whereas G is penalized with significant model parameter changes. D determines whether a batch of samples produced by G is authentic or fake by comparing it with actual samples from the original datasets. To improve its ability to distinguish between genuine and fake samples in the following round, D is updated. More crucially, G is also updated based on whether D is successfully tricked by the generated samples [3,4]. Therefore, the fundamental building blocks of GANs include the generator G, discriminator, D or critic, C, and the associated loss functions. The G model is used to generate new plausible examples from the problem domain, and the D model is used to classify examples from the domain as real or fake. The loss function helps to evaluate the model performance [3,5]. It measures the accuracy of the model in terms of predicting the expected outcome. Basically, it improves the stability of the trained GAN model.
The GAN architecture is shown in Figure 2. First, we sample the noise, z, using a normal or uniform distribution. With z as an input, we use a generator, G, to create an image x (x = G(z)) after performing multiple transposed convolutions to upsample z. The discriminator processes the real images (training samples) and generated images separately. It distinguishes whether the input image is real or generated. The output D(x) is the probability that the input x is real. If the input is real, D(x) should equal 1. If it is generated, it should be zero. Through this process, the discriminator identifies the features that distinguish real images from generated images. We train the generator to create images that may be interpreted as real by the discriminator. We train both networks in alternating steps and lock them into a fierce competition to improve themselves. Eventually, the discriminator detects a small difference between the real and generated images, and the generator generates images that cannot be distinguished by the discriminator. The GAN model repeats this process until the loss converges to zero, and the distributions are approximately equal.
There are several application domains of GANs, and significant progress has been made in areas, such as image super-resolution, art creation, speech synthesis, image-to-image translation, video prediction, and 3D object generation [3,4,7,8,9,10]. GANs often evaluate input images and leverage convolutional neural networks (CNNs) as generator and discriminator models, and the performance of these neural networks often improves with the amount of available data. Data augmentation is a technique to artificially generate data. Data augmentation using GANs helps to generate more plausible training datasets from existing data [7,8,9,10]. This helps to develop better models by improving the model characteristics and providing a regularizing effect, which reduces the generalization error.
In addition to GANs, restricted Boltzmann machines (RBM), generative stochastic networks (GSN), deep belief networks (DBN), deep Boltzmann machines (DBM), and variational autoencoders are used in deep learning [4,11]. Unlike classic augmentation algorithms, GANs can handle invariances represented by simple transformations and strengthen weak points in the learned decision boundaries [7,12]. In addition, GANs have exhibited tremendous computation speed and quality of results compared with other methods [7]. GANs generate new training data, which lead to better classification performance [4,7]. To select the best performing model for data augmentation, the output from each model is evaluated by human visual inspection [13,14,15,16] and by calculating the Fréchet inception distance (FID). The FID is a metric used in GAN to compute the Wasserstein distance between feature vectors calculated between a fake and real image [14,17,18,19]. With the FID, it is possible to identify the output images that show close diversity to the original images. A very low score indicates that the output and original images are identical and have comparable statistics [18]. The FID is expressed as given by Equation (1):
FID = ‖μx − μy2 + Tr (∑x+∑y − 2 √(∑xy))
where x and y are the real and generated samples, respectively, i.e., the activations from the pretrained Inception-v3 model [20], and μx and μy are the feature-wise means of the real and generated images, respectively. Tr is the trace of the matrix, and ∑x and ∑y are the covariance matrices of the vectors [18,19,20,21]. In [19], the FID evaluation measure was empirically demonstrated to be a viable metric because of its resistance to mode dropping and encoding networks.
The inception score (IS) is another metric that is used to evaluate the image quality and diversity. The IS can be used to evaluate clear images generated by GANs, and the model used to train a GAN should output a high diversity of images from all different classes in ImageNet [13,22]. The most common use of IS is in non-ImageNet datasets for generative models trained on CIFAR-10 [22]. This is because CIFAR-10 is significantly smaller and is more manageable for training than ImageNet. Hence, applying the IS to generative models trained on datasets other than ImageNet yields misleading results [13,23]. The IS score is high for clear images that are well classified into specified object types [23,24]. Owing to the generated images not being present in the classifier’s training data, the IS is not ideal for our dataset.
In this study, we leveraged GAN models to produce more datasets to provide accurate results for automatic recognition of expiry-dates in photos, which requires a large amount of learning data. We implemented and evaluated state-of-the-art GAN models, such as WGAN [5], WGAN-GP [21], WGAN-DIV [25], MMGAN [26], NSGAN [26], LSGAN [27], DRAGAN [28], ACGAN [29], DCGAN [30], EBGAN [31], VAE [32], and BEGAN [33], to augment the data for our dataset. We concluded that Wasserstein GAN with a gradient norm penalty (WGAN-GP) is a suitable data augmentation technique for our dataset. Our results show that the stability of WGAN-GP aids in the production of not only high-quality data but also images of a variety of styles. The images have an average Fréchet inception distance (FID) value of 1.5298 across 10 digits (0–9) that are nearly indistinguishable from our original dataset.
The remainder of this paper is structured as follows. The background section introduces a GAN for data augmentation and describes two metrices to evaluate the GAN performance: FID and inception score (IS). In the section on relevant literature, we introduce recent studies on data augmentation using GAN and state-of-the-art GAN models. In the section, Data Augmentation for Engraved-Digit Images, we establish the method of analysis and discuss the evaluation metrices that are used. In the Evaluation and Analysis section, we provide brief details of the WGAN-GP model, describe its hyperparameters, and present loss plots to demonstrate optimal results. Finally, we summarize our study in the Conclusions section.

2. Relevant Literature

2.1. Data Augmentation by GAN

Researchers have proposed and applied several GAN models to supplement datasets that are insufficient for classification tasks. These datasets vary across different domain areas of application, and the GAN models that are the most suitable for the augmentation process are often adopted. The augmented datasets are applied to a classifier for validation, and a set of benchmark results for the datasets is presented, which may then be used to further characterize and validate the datasets. Our work targets a dataset of few-digit images that are engraved on medicines, ointments, and other forms of squeeze tubes, and the images are usually distorted and blurry. We evaluated the state-of-the-art GAN models to determine the most preferred model for the data augmentation task.
Wang et al. [34] integrated a GAN into an all-encompassing information-retrieval framework. Their approach demonstrated that GAN-based information retrieval systems are promising, but more work is required. The idea of data augmentation across several domains and simple models aided by large datasets can be extremely helpful in improving the performance of object detection applications [5,10,35]. In [36], the authors evaluated the effectiveness of two augmentation methods for CNN-based MODI script handwritten character recognition using the same CNN architecture. They confirmed that the on-the-fly data augmentation technique was more accurate than an offline approach. This strategy enabled the network to view a fresh collection of data each time, boosting the effectiveness of the system. Using the standard MNIST handwritten digit dataset in [37], the authors conducted an experimental evaluation of the advantages of data augmentation for convolutional backpropagation-trained neural networks, convolutional support vector machines, and convolutional extreme learning machine classifiers. Their work showed that in cases where reasonable data transforms are available, augmentation in the data space offers an advantage for enhancing performance and reducing overfitting. The authors investigated the advantages of training a machine learning classifier with examples that were artificially manufactured to supplement the data. In addition, in [38], the authors demonstrated the method of training a deep neural network (DNN) for optical character recognition (OCR) on historical manuscripts with a small amount of data. They examined various methods for data augmentation of palimpsests and evaluated the impact of various techniques on the performance of the DNN. In [8], the authors demonstrated the effective enhancement of conventional vanilla classifiers by a data augmentation generative adversarial network (DAGAN). Additionally, they demonstrated the usage of a DAGAN to improve few-shot learning systems, such as matching networks. Their approach was evaluated on the Omniglot [39], EMNIST [40], and VGG-face datasets [41].
In recent times, GANs have been extensively used in medical applications. In [10], the authors presented a literature review on the application of GANs in ophthalmology image domains to discuss key contributions and suggested probable future research trajectories. Whereas medical records can contain sensitive and personal data of patients, training of GANs with these original datasets may be regulated and a need for synthetic datasets for research in this domain may arise [42]. Other applications of GANs in medicine are discussed in [43,44].

2.2. GAN Models for IR

The Wasserstein GAN with gradient penalty, WGAN-GP [21], was proposed as an alternative for clipping weights, i.e., to penalize the norm of the gradient of the critic with respect to its input. This model has made progress towards the stable training of GANs with almost no hyperparameter tuning, unlike WGAN [5], which suffers from training instability and sometimes generates poor samples or suffers convergence failure. WGAN-GP proposes a gradient penalty to be added to the WGAN discriminator loss as an alternative method for enforcing the Lipschitz constraint, which was previously performed by weight clipping. This penalty does not suffer from bias of the discriminator toward simple functions owing to weight clipping. Additionally, the reformulation of the discriminator by adding a gradient penalty term makes batch normalization unnecessary. Wasserstein divergence, WGANDIV [25], a symmetric divergence, has been proved to faithfully approximate the corresponding Wasserstein divergence through optimization. It has been demonstrated to be stable under various settings, including progressive growing training and has exhibited superior results when compared with state-of-the-art methods, both quantitatively and qualitatively.
In NSGAN [26], two models are simultaneously trained: G, which captures the data distribution, and D, which estimates the probability that a sample is from the training data rather than G. The objective of the training process for G is to maximize the probability that D makes a mistake. The generator loss is the only difference when compared with the MMGAN [26]. In both the NSGAN and MMGAN, the output of G can be interpreted as a probability. However, the output of D in LSGAN [27] is unbounded unless it passes through an activation function. A sigmoid activation function has also been implemented. LSGAN tackles the vanishing gradient problem associated with GANs by swapping out the cross-entropy loss function with the least-squares (L2) loss function. In [27], the authors claimed that the L2 loss function penalizes samples that are identified by the discriminator as real but located outside the decision boundary. Hence, the generated visuals are meant to closely resemble real data. It also stabilizes the training process. The D in DRAGAN [28] can be interpreted as a probability, similar to the D models in MMGAN and NSGAN. DRAGAN is similar to WGAN-GP but seems to be less stable. This model is very similar to WGAN-GP as it applies a gradient penalty to try and obtain an improved training objective based on the optimal performance of D and G. The gradient penalty is only applied close to the real data manifold, whereas WGAN-GP selects the gradient location on a random line between a real and randomly generated fake sample.
The auxiliary classifier GAN (AC-GAN) [29] is an extension of the conditional GAN that changes the discriminator to predict the class label of a given image rather than receive it as an input. It stabilizes the training process and allows the generation of large high-quality images while learning a representation in the latent space that is independent of the class label. A deep convolutional GAN (DCGAN) is a generative adversarial network with convolutional neural networks as the generator and discriminator [30]. The DCGAN was proposed after evaluating a set of constraints on the architectural topology of convolutional GANs that allowed them to be trained in a stable manner in most settings. The trained discriminators were used for image classification tasks, showing competitive performance with other unsupervised algorithms.
To model the discriminator, D(x), as an energy function that assigns low energies to regions close to the data manifold and greater energies to other regions, a family of GANs called EBGANs has been proposed [31]. The discriminator uses an autoencoder. The encoder extracts latent features from the input image, and subsequently, the decoder performs image reconstruction. Instead of a probability value similar to the original GAN, the discriminator outputs the reconstruction mean square error, MSE, between the input image and reconstructed image VAEs [32]. Thus, it encodes an input into a given dimension, z, reparametrizes z using its mean and standard deviation and then reconstructs the image from the reparametrized z. BEGAN [33] optimizes a lower bound of the Wasserstein distance between autoencoder loss distributions on the original and generated data, using an autoencoder as a discriminator. To maintain the discriminator and generator in equilibrium, the authors added an additional hyperparameter γ ∈ [0, 1].

3. Data Augmentation for Engraved-Digit Images

The objective of this study is to compare the state-of-art GAN models and identify the models that are suitable for data augmentation for our specific types of datasets after evaluating the generated data images, variety of the images produced, and performance of the GAN model from the FID score after training. The proposed GAN models were trained using optimal tuning with the same epoch time.

3.1. Engraved-Digit Image Dataset

As shown in Figure 3, our dataset has the properties of both digits and unusual shadows by engraved shapes in the images. They were collected from medicines, gels, and tube-type ointments. Classification models perform poorly when insufficient or blurry data are used for model training. We trained these limited datasets on state-of-the-art GAN models to produce high-quality grayscale fake images. The images were separated into classes of 0–9. We collected approximately a hundred images per digit, which were selectively trained one class at a time to evaluate the diversity of the images produced by each GAN. The data were pre-processed for standardization. We reshaped and converted all input images to a grayscale of 128 × 128 × 1 in the range of [−1, 1]. This was done to maintain the same color in the images.

3.2. Data Augmentation for Our Dataset by GANs

We trained our dataset using the following state-of-the-art GAN models: MMGAN, NSGAN, LSGAN, ACGAN, DCGAN, WGAN, WGAN-GP, WGANDIV, DRAGAN, BEGAN, EBGAN, and VAE. We implemented these models and examined the images produced by each GAN. In addition, we closely analyzed the graph showing the relationship between the generator and discriminator loss plots of each GAN.

3.2.1. A Classification of GANs

As shown in Table 1, we classified the GAN models into four groups. Group 1 is based on the way D, also known as the critic C, wants to maximize the expression to differentiate between the real and fake images. Group 2 is composed of models based on the way D estimates the probability that a sample originates from the training data rather than G. The training procedure for G maximizes the probability of D making a mistake. Group 3 is based on D’s main task, i.e., to perform image classification. The basic architecture of the GANs in Groups 1, 2, and 3 is shown in Figure 2. In contrast, the GANs in Group 4 have an autoencoder in their architecture. The VAE architecture comprises an encoder and a decoder trained to minimize the reconstruction loss, with the input being encoded as a distribution over the latent space. Figure 4 shows the embedding of autoencoders in the EBGAN and BEGAN architectures. Both GANs can be represented using the same architecture; however, in each GAN, the reconstruction loss is calculated differently. Whereas EBGAN uses the mean square error (MSE) to calculate the reconstruction loss, BEGAN uses the Wasserstein distance.

3.2.2. Digit Image Generation by GANs

The 0–9-digit images generated with our dataset by the GANs are shown in Table 2, along with some selected outputs. The number under each image is the epoch in which the image was generated. We selected the best and most stable images produced during 50,000 epochs of execution. We trained each class of our datasets (digits 0–9) for 50,000 epochs with sample images saved every 1000 epochs.

3.2.3. FIDs and ISs

We calculated the FID [10,17], which measures the statistical difference between the features of our original data and generated images, as shown in Table 3. FID addresses the flaw of IS, in which the statistics of the original and generated samples are not compared [20]. A perfect score of 0.0 indicates that the two sets of images are identical, whereas lower scores indicate that the two groups of images are more similar or have more in common statistically.
Table 4 lists the IS values calculated for each GAN output image. The average IS calculated across all GANs in Table 4 is approximately 1.0. IS measures the realism of a GAN output by using a pre-trained deep learning neural network model to predict the class probabilities for each class of the generated image, as opposed to FID, which is calculated by comparing the statistics of the generated samples with the original samples. A high IS score is an excellent score and usually ranges from 1 to the number of classes in a pretrained network [23]. In our case, the IS is very low because the generated images were not present in the training data of the classifier. Hence, the IS is not ideal for our dataset.

4. Evaluation and Analysis

We evaluated the images by visually inspecting the quality of the images in Table 2 and the average FID score of the GAN model, as shown in the last column of Table 3.
In Table 2, the BEGAN and VAE outputs are not considered suitable for data augmentation, even though their FID values are the lowest at 1.0945 and 0.484, respectively. This is because the images are of poor quality and have little or no diversity. The DRAGAN model is similar to WGAN-GP, but the generated images are not of good visual quality compared with the output images from WGAN-GP and WGANDIV. Considering our evaluation metrices, both WGAN-GP and WGANDIV outputs are the most preferred. However, WGANDIV slightly outperforms WGAN-GP considering the FID score. The corresponding average FID scores of WGAN-GP and WGANDIV are 1.5298 and 1.4933, respectively, which corroborates their image quality. The low FID values translate to smaller distances between the fake and original data distributions. Figure 5, Figure 6 and Figure 7 are the plots for WGAN-GP, WGANDIV and BEGAN for digits 3 images, respectively while Figure 8, Figure 9 and Figure 10 shows the plot for WGAN-GP, WGANDIV and BEGAN for digits 0 images, respectively. However, we prefer WGAN-GP simply because of the noticeable quality of images across the digits 0–9. While each GAN displays similarities in its digit-plots from 0–9, Figure 5, Figure 6 and Figure 7 show the loss plots for both WGAN-GP, WGANDIV and BEGAN for the digit-3 images, and Figure 8, Figure 9 and Figure 10 do for the digit-0 images, respectively. We have only selected the digit 3 and 0, but the WGAN-GP plots have shown consistency and stability for the entire digit images in the range 0–9.
Unlike BEGAN in Figure 7 and Figure 10, the WGAN_GP plots in Figure 5 and Figure 8 show the increase in discriminator losses (D_Loss), whereas there is a decrease in the generator losses (G_Loss). In all WGAN-GP plots in our experiment, it is evident that whereas the discriminator attempts to maximize its ability to detect real data from fake data, the generator attempts to minimize the distance between the generated and real data.
The performance of a GAN model strongly depends on the dataset, and models are evaluated based on the quality of images generated in the context of a specific targeted domain [19]. After execution, the WGAN-GP on our dataset achieves an FID score of 1.5298. This score and the generated images reflect the high quality of the desired data required for augmentation. Hence, we conclude that WGAN-GP is an appropriate model for replenishing our dataset. In Table 5 [19], WGAN-GP shows good performance in terms of the FID score across several other datasets. We compared this result with the FID score of only one class of our dataset: digit 3. The FID scores obtained in a large-scale hyperparameter search for the FASHION (60,000 training datasets), CIFAR-10 (6000 images/class), and CELEBA datasets (202,599 face images) are displayed [19]. The best scores are the WGAN-GP scores of 24.5, 55.8, 30, and 0.761 for the FASHION, CIFAR-10, CELEBA, and ENGRAVED DIGITS datasets, respectively.
The WGAN-GP we implemented is shown in Figure 11 and Algorithm 1 in detail. During the implementation, the discriminator network (C) was first trained on a real batch of data (x), then it was trained on a batch of data generated from the noise (Z) via the generator (G). This was required to provide a random weighted average between real and generated image samples needed for gradient norm penalty. The discriminator’s loss function was arranged such that it estimates the Wasserstein Distance with gradient penalty. The gradient penalty was computed for a batch of average samples to ensure it is 1-Lipschitz-Continuous and was first computed using prediction and weighted real/fake samples. The performance of Algorithm 1 depends on the number of iterations for θ converge, n θ , the number of critic iterations per generator iteration, η c r i t i c , and the batch size, m : O ( n θ η c r i t i c m ) .
Algorithm 1. WGAN-GP with our set of parameters [21]
λ = 10 ,     η c r i t i c = 5 ,   α = 0.0002 ,   β 1 = 0 ,   β 2 = 0.9 .
Require: The gradient penalty coefficient λ , the number of critic iterations per generator iteration η c r i t i c ,   the   batch   size   m , Adam hyperparameters α , β 1 , β 2 .
Require: initial critic parameters ω 0 , initial generator parameters θ 0 .
1:while  θ has not converged do
2:  for  t 1 , , η c r i t i c  do
3:    for  i = 1 , ,   m  do
4:      Sample real data x r , latent variable z p ( z ) , a random number ϵ U [ 0 , 1 ]
5:       x ˜ G θ ( z )
6:       x ˜ ϵ x + ( 1 ϵ ) x ˜
7:       L ( i ) D w ( x ˜ ) D w ( x ) + λ ( x ^ D w ( x ^ ) 2 1 ) 2
8:     end for
9:     w A d a m ( w 1 m i = 1 m L ( i ) , w , α , β 1 , β 2 )
10:   end for
11:  Sample a batch of latent variables { z ( i ) } i = 1 m ~   p ( z ) .
12:   θ A d a m ( θ 1 m i = 1 m D w ( G θ ( z ) ) θ , α , β 1 , β 2 )
13:end while

5. Conclusions

In this study, we investigated data augmentation for engraved-digit image datasets on expiry date. Limited, fuzzy, and distorted digit images were required to improve image identification of consumables and cosmetic products. We evaluated the state-of-the-art GAN models, MMGAN, NSGAN, LSGAN, ACGAN, DCGAN, WGAN, WGAN-GP, WGANDIV, DRAGAN, BEGAN, EBGAN, and VAE by visually inspecting the results and calculating the FID values for each GAN. WGAN-GP and WGANDIV show stability and are suitable for the data augmentation task; however, we consider WGAN-GP to be the preferred GAN owing to its quality output images after visual inspection. The consistency and stability of its G_Loss and D_Loss plots over the digits 0–9 is also satisfactory.
Our future research would focus on automatic recognition of engraved expiry-date digit images. After augmenting the few datasets to an abundance with WGANGP. We intend to build a recognition model and train the model with these synthetic datasets to a very high degree of confidence. We conjecture that the image quality and diversity from WGAN-GP would contribute to our model’s stability. The new recognition model would not only be tailored to engraved-digit image recognition, but it would also serve as a benchmark for models required to train similar datasets to high performance.

Author Contributions

A.A. and I.Y.J. conceived and designed the experiments; A.A. performed the experiments; A.A. and I.Y.J. analyzed the data; A.A. wrote the paper and I.Y.J. re-organized and corrected the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (No. 2021R1F1A1064345) and by the BK21 FOUR project funded by the Ministry of Education, Korea (No. 4199990113966).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Tran, T.N.; Felfernig, A.; Trattner, C.; Holzinger, A. Recommender systems in the healthcare domain: State-of-the-art and research issues. J. Intell. Inf. Syst. 2021, 57, 171–201. [Google Scholar] [CrossRef]
  2. Deng, L. The mnist database of handwritten digit images for machine learning research. IEEE Signal Process. Mag. 2012, 29, 141–142. [Google Scholar] [CrossRef]
  3. Chollet, F. Deep Learning with Python; Simon and Schuster: New York City, NY, USA, 2021. [Google Scholar]
  4. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2020, 63, 139–144. [Google Scholar]
  5. Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017. [Google Scholar]
  6. Medium.com. Available online: https://jonathan-hui.medium.com/gan-energy-based-gan-ebgan-boundary-equilibrium-gan-began-4662cceb7824 (accessed on 27 August 2022).
  7. Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
  8. Antoniou, A.; Storkey, A.; Edwards, H. Data augmentation generative adversarial networks. arXiv 2017, arXiv:1711.04340. [Google Scholar]
  9. Iqbal, A.; Sharif, M.; Yasmin, M.; Raza, M.; Aftab, S. Generative adversarial networks and its applications in the biomedical image segmentation: A comprehensive survey. Int. J. Multimed. Inf. Retr. 2022, 11, 333–368. [Google Scholar] [CrossRef]
  10. You, A.; Kim, J.K.; Ryu, I.H.; Yoo, T.K. Application of generative adversarial networks (GAN) for ophthalmology image domains: A survey. Eye Vis. 2022, 9, 6. [Google Scholar] [CrossRef]
  11. Bengio, Y.; Laufer, E.; Alain, G.; Yosinski, J. Deep generative stochastic networks trainable by backprop. In Proceedings of the International Conference on Machine Learning, Beijing, China, 21–26 June 2014. [Google Scholar]
  12. Wenzel, M. Generative Adversarial Networks and Other Generative Models. arXiv 2022, arXiv:2207.03887. [Google Scholar]
  13. Borji, A. Pros and cons of gan evaluation measures. Comput. Vis. Image Underst. 2019, 179, 41–65. [Google Scholar] [CrossRef]
  14. Shmelkov, K.; Schmid, C.; Alahari, K. How good is my GAN? In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
  15. Esteban, C.; Hyland, S.L.; Rätsch, G. Real-valued (medical) time series generation with recurrent conditional gans. arXiv 2017, arXiv:1706.02633. [Google Scholar]
  16. Denton, E.L.; Chintala, S.; Fergus, R. Deep generative image models using a laplacian pyramid of adversarial networks. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar] [CrossRef]
  17. Zhu, X.; Vondrick, C.; Fowlkes, C.C.; Ramanan, D. Do we need more training data? Int. J. Comput. Vis. 2016, 119, 76–92. [Google Scholar] [CrossRef]
  18. Dowson, D.C.; Landau, B. The Fréchet distance between multivariate normal distributions. J. Multivar. Anal. 1982, 12, 450–455. [Google Scholar] [CrossRef]
  19. Lucic, M.; Kurach, K.; Michalski, M.; Gelly, S.; Bousquet, O. Are gans created equal? A large-scale study. Adv. Neural Inf. Process. Syst. 2018, 31. [Google Scholar] [CrossRef]
  20. Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
  21. Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A.C. Improved training of wasserstein gans. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
  22. Krizhevsky, A.; Vinod, N.; Geoffrey, H. The CIFAR-10 Dataset 2014. Available online: http://www.cs.toronto.edu/kriz/cifar (accessed on 6 August 2022).
  23. Barratt, S.; Sharma, R. A note on the inception score. arXiv 2018, arXiv:1801.01973. [Google Scholar]
  24. Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
  25. Wu, J.; Huang, Z.; Thoma, J.; Acharya, D.; Van Gool, L. Wasserstein divergence for gans. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
  26. Goodfellow, I. Nips 2016 tutorial: Generative adversarial networks. arXiv 2016, arXiv:1701.00160. [Google Scholar]
  27. Mao, X.; Li, Q.; Xie, H.; Lau, R.Y.; Wang, Z.; Paul, S.S. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
  28. Kodali, N.; Abernethy, J.; Hays, J.; Kira, Z. On convergence and stability of gans. arXiv 2017, arXiv:1705.07215. [Google Scholar]
  29. Odena, A.; Olah, C.; Shlens, J. Conditional image synthesis with auxiliary classifier gans. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017. [Google Scholar]
  30. Radford, A.; Metz, L.; Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv 2015, arXiv:1511.06434. [Google Scholar]
  31. Zhao, J.; Mathieu, M.; LeCun, Y. Energy-based generative adversarial network. arXiv 2016, arXiv:1609.03126. [Google Scholar]
  32. Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
  33. Berthelot, D.; Schumm, T.; Metz, L. Began: Boundary equilibrium generative adversarial networks. arXiv 2017, arXiv:1703.10717. [Google Scholar]
  34. Wang, J.; Yu, L.; Zhang, W.; Gong, Y.; Xu, Y.; Wang, B.; Zhang, P.; Zhang, D.I. A minimax game for unifying generative and discriminative information retrieval models. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Tokyo, Japan, 7–11 August 2017. [Google Scholar]
  35. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 84–90. [Google Scholar] [CrossRef]
  36. Joseph, S.; George, J. Data augmentation for handwritten character recognition of MODI script using deep learning method. In Proceedings of the International Conference on Information and Communication Technology for Intelligent Systems, Singapore, 15 May 2020. [Google Scholar]
  37. Wong, S.C.; Gatt, A.; Stamatescu, V.; McDonnell, M.D. Understanding data augmentation for classification: When to warp? In Proceedings of the 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Gold Coast, Australia, 30 November 2016. [Google Scholar]
  38. Starynska, A.; Easton, R.L., Jr.; Messinger, D. Methods of data augmentation for palimpsest character recognition with deep neural network. In Proceedings of the 4th International Workshop on Historical Document Imaging and Processing, Kyoto, Japan, 10 November 2017. [Google Scholar]
  39. Lake, B.M.; Salakhutdinov, R.; Tenenbaum, J.B. Human-level concept learning through probabilistic program induction. Science 2015, 350, 1332–1338. [Google Scholar] [CrossRef]
  40. Cohen, G.; Afshar, S.; Tapson, J.; Van, S.A. EMNIST: Extending MNIST to handwritten letters. In Proceedings of the International Joint Conference on Neural Networks (IJCNN) IEEE, Anchorage, AK, USA, 14 May 2017. [Google Scholar]
  41. Parkhi, O.M.; Andrea, V.; Andrew, Z. Deep face recognition. In Proceedings of the British Machine Vision Conference, Swansea, UK, 7–11 September 2015. [Google Scholar]
  42. Tanaka, F.H.; Aranha, C. Data augmentation using GANs. arXiv 2019, arXiv:1904.09135. [Google Scholar]
  43. Wickramaratne, S.D.; Mahmud, M.S. Conditional-GAN based data augmentation for deep learning task classifier improvement using fNIRS data. Front. Big Data 2021, 4, 659146. [Google Scholar] [CrossRef]
  44. Wei, K.; Li, T.; Huang, F.; Chen, J.; He, Z. Cancer classification with data augmentation based on generative adversarial networks. Front. Comput. Sci. 2022, 16, 162601. [Google Scholar] [CrossRef]
Figure 1. Expiry dates on medicines, creams, ointments, and gels in squeeze tubes.
Figure 1. Expiry dates on medicines, creams, ointments, and gels in squeeze tubes.
Sustainability 14 12479 g001
Figure 2. GAN architecture. Adapted from ref [6].
Figure 2. GAN architecture. Adapted from ref [6].
Sustainability 14 12479 g002
Figure 3. (a) The best selection of engraved digits cropped from the expiry dates in our source datasets (as seen in Figure 1), and (b) more samples of digit-2 images.
Figure 3. (a) The best selection of engraved digits cropped from the expiry dates in our source datasets (as seen in Figure 1), and (b) more samples of digit-2 images.
Sustainability 14 12479 g003
Figure 4. BEGAN/EBGAN architecture.
Figure 4. BEGAN/EBGAN architecture.
Sustainability 14 12479 g004
Figure 5. WGAN-GP (digit 3) G_Loss/D_Loss, and the image samples generated at the epochs under the images.
Figure 5. WGAN-GP (digit 3) G_Loss/D_Loss, and the image samples generated at the epochs under the images.
Sustainability 14 12479 g005
Figure 6. WGANDIV (digit 3) G_Loss/D_Loss, and the image samples generated at the epochs under the images.
Figure 6. WGANDIV (digit 3) G_Loss/D_Loss, and the image samples generated at the epochs under the images.
Sustainability 14 12479 g006
Figure 7. BEGAN (digit 3) G_Loss/D_Loss, and the image samples generated at the epochs under the images.
Figure 7. BEGAN (digit 3) G_Loss/D_Loss, and the image samples generated at the epochs under the images.
Sustainability 14 12479 g007
Figure 8. WGAN-GP (digit 0) G_Loss/D_Loss, and the image samples generated at the epochs under the images.
Figure 8. WGAN-GP (digit 0) G_Loss/D_Loss, and the image samples generated at the epochs under the images.
Sustainability 14 12479 g008
Figure 9. WGANDIV (digit 0) G_Loss/D_Loss, and the image samples generated at the epochs under the images.
Figure 9. WGANDIV (digit 0) G_Loss/D_Loss, and the image samples generated at the epochs under the images.
Sustainability 14 12479 g009
Figure 10. BEGAN (digit 0) G_Loss/D_Loss, and the image samples generated at the epochs under the images.
Figure 10. BEGAN (digit 0) G_Loss/D_Loss, and the image samples generated at the epochs under the images.
Sustainability 14 12479 g010
Figure 11. Architecture of a WGAN-GP.
Figure 11. Architecture of a WGAN-GP.
Sustainability 14 12479 g011
Table 1. GAN classification.
Table 1. GAN classification.
Group1234
GANsWGANLSGANACNBEGAN
WGANDIVMMGANDCGANEBGAN
WGAN-GPNSGAN VAE
DRAGAN
Table 2. The generated images by GANs. The epoch when each image was generated is specified under the image.
Table 2. The generated images by GANs. The epoch when each image was generated is specified under the image.
GANs0123456789
WGAN Sustainability 14 12479 i001
48,000
Sustainability 14 12479 i002
38,000
Sustainability 14 12479 i003
42,000
Sustainability 14 12479 i004
49,000
Sustainability 14 12479 i005
49,000
Sustainability 14 12479 i006
47,000
Sustainability 14 12479 i007
49,000
Sustainability 14 12479 i008
49,000
Sustainability 14 12479 i009
49,000
Sustainability 14 12479 i010
49,000
WGAN_DIV Sustainability 14 12479 i011
49,000
Sustainability 14 12479 i012
42,000
Sustainability 14 12479 i013
43,000
Sustainability 14 12479 i014
48,000
Sustainability 14 12479 i015
41,000
Sustainability 14 12479 i016
50,000
Sustainability 14 12479 i017
26,000
Sustainability 14 12479 i018
43,000
Sustainability 14 12479 i019
46,000
Sustainability 14 12479 i020
28,000
WGAN_GP Sustainability 14 12479 i021
46,000
Sustainability 14 12479 i022
48,000
Sustainability 14 12479 i023
34,000
Sustainability 14 12479 i024
38,000
Sustainability 14 12479 i025
48,000
Sustainability 14 12479 i026
45,000
Sustainability 14 12479 i027
47,000
Sustainability 14 12479 i028
44,000
Sustainability 14 12479 i029
46,000
Sustainability 14 12479 i030
42,000
LSGAN Sustainability 14 12479 i031
50,000
Sustainability 14 12479 i032
50,000
Sustainability 14 12479 i033
50,000
Sustainability 14 12479 i034
50,000
Sustainability 14 12479 i035
50,000
Sustainability 14 12479 i036
50,000
Sustainability 14 12479 i037
50,000
Sustainability 14 12479 i038
50,000
Sustainability 14 12479 i039
50,000
Sustainability 14 12479 i040
50,000
MMGAN Sustainability 14 12479 i041
50,000
Sustainability 14 12479 i042
50,000
Sustainability 14 12479 i043
50,000
Sustainability 14 12479 i044
50,000
Sustainability 14 12479 i045
50,000
Sustainability 14 12479 i046
50,000
Sustainability 14 12479 i047
50,000
Sustainability 14 12479 i048
50,000
Sustainability 14 12479 i049
50,000
Sustainability 14 12479 i050
50,000
NSGAN Sustainability 14 12479 i051
50,000
Sustainability 14 12479 i052
50,000
Sustainability 14 12479 i053
50,000
Sustainability 14 12479 i054
50,000
Sustainability 14 12479 i055
50,000
Sustainability 14 12479 i056
50,000
Sustainability 14 12479 i057
50,000
Sustainability 14 12479 i058
50,000
Sustainability 14 12479 i059
50,000
Sustainability 14 12479 i060
50,000
DRAGAN Sustainability 14 12479 i061
50,000
Sustainability 14 12479 i062
50,000
Sustainability 14 12479 i063
50,000
Sustainability 14 12479 i064
50,000
Sustainability 14 12479 i065
50,000
Sustainability 14 12479 i066
50,000
Sustainability 14 12479 i067
50,000
Sustainability 14 12479 i068
50,000
Sustainability 14 12479 i069
50,000
Sustainability 14 12479 i070
50,000
ACGAN Sustainability 14 12479 i071
50,000
Sustainability 14 12479 i072
50,000
Sustainability 14 12479 i073
50,000
Sustainability 14 12479 i074
50,000
Sustainability 14 12479 i075
50,000
Sustainability 14 12479 i076
50,000
Sustainability 14 12479 i077
50,000
Sustainability 14 12479 i078
50,000
Sustainability 14 12479 i079
50,000
Sustainability 14 12479 i080
50,000
DCGAN Sustainability 14 12479 i081
50,000
Sustainability 14 12479 i082
50,000
Sustainability 14 12479 i083
50,000
Sustainability 14 12479 i084
50,000
Sustainability 14 12479 i085
50,000
Sustainability 14 12479 i086
50,000
Sustainability 14 12479 i087
50,000
Sustainability 14 12479 i088
50,000
Sustainability 14 12479 i089
50,000
Sustainability 14 12479 i090
50,000
BEGAN Sustainability 14 12479 i091
50,000
Sustainability 14 12479 i092
50,000
Sustainability 14 12479 i093
50,000
Sustainability 14 12479 i094
50,000
Sustainability 14 12479 i095
50,000
Sustainability 14 12479 i096
50,000
Sustainability 14 12479 i097
50,000
Sustainability 14 12479 i098
50,000
Sustainability 14 12479 i099
50,000
Sustainability 14 12479 i100
50,000
EBGAN Sustainability 14 12479 i101
50,000
Sustainability 14 12479 i102
50,000
Sustainability 14 12479 i103
50,000
Sustainability 14 12479 i104
50,000
Sustainability 14 12479 i105
50,000
Sustainability 14 12479 i106
50,000
Sustainability 14 12479 i107
50,000
Sustainability 14 12479 i108
50,000
Sustainability 14 12479 i109
50,000
Sustainability 14 12479 i110
50,000
VAE Sustainability 14 12479 i111
50,000
Sustainability 14 12479 i112
22,000
Sustainability 14 12479 i113
32,000
Sustainability 14 12479 i114
49,000
Sustainability 14 12479 i115
49,000
Sustainability 14 12479 i116
49,000
Sustainability 14 12479 i117
50,000
Sustainability 14 12479 i118
49,000
Sustainability 14 12479 i119
49,000
Sustainability 14 12479 i120
49,000
Table 3. FID values for the evaluated GANs.
Table 3. FID values for the evaluated GANs.
GANs0123456789Average
WGAN3.9855.4158.2685.6176.0838.2923.2312.8905.5906.2025.5573
WGAN_DIV0.8741.4451.0081.5501.8690.9521.0722.0141.7422.4071.4933
WGAN_GP1.2881.4420.7611.4512.2101.0891.0012.4921.7041.8601.5298
LSGAN5.0294.4074.2534.3894.8125.5354.0874.7344.8215.2244.7291
MMGAN30.1344.7435.38913.5826.2676.7696.6834.6306.4406.7699.1406
NSGAN7.3747.6557.2924.7948.6716.7696.7138.4526.7136.9187.1351
DRAGAN5.0123.3553.9823.1614.5834.1334.8235.2563.5888.4044.6297
ACGAN7.1696.0736.0866.1486.6317.1456.2977.4578.0558.4486.9509
DCGAN3.3923.8493.1263.5114.5253.3863.1063.7514.0134.0443.670
BEGAN1.2670.7120.2960.3400.4912.1430.9571.4781.0162.2451.0945
EBGAN2.7182.6811.4023.4760.5453.5972.9402.6083.5324.9322.843
VAE0.7960.3090.1140.8610.3530.4370.4320.7700.7220.810.484
Table 4. IS values for the evaluated GANs.
Table 4. IS values for the evaluated GANs.
GANs0123456789
WGAN1.00000011.00000021.01.01.01.01.01.01.01.0000002
WGAN_DIV1.01.01.01.01.00000011.01.00000021.01.01.0
WGAN_GP1.01.01.01.01.01.01.01.01.01.0
LSGAN1.01.00000021.01.01.00000011.00000011.00000011.00000021.00000021.0
MMGAN1.00000021.00000071.01.00000021.00000021.00000021.00000041.01.00000021.0000002
NSGAN1.00000021.00000021.00000011.01.00000021.00000021.00000021.01.00000011.0000005
DRAGAN1.01.01.01.00000021.01.00000011.00000021.01.00000011.0000006
ACGAN1.01.01.01.00000011.00000021.00000021.00000011.00000011.00000021.0000002
DCGAN1.01.01.01.01.01.01.01.01.00000011.0
BEGAN1.01.00000021.01.00000011.00000021.00000011.00000021.00000021.00000011.0000002
EBGAN1.01.01.01.00000011.00000011.00000021.00000011.01.01.0
VAE 1.01.00000021.00000021.00000021.00000021.01.01.01.01.0
Table 5. FID of WGAN-GP digit-2 comparison with other datasets.
Table 5. FID of WGAN-GP digit-2 comparison with other datasets.
GANsMNISTFASHIONCIFARCELEBAENGRAVED DIGITS
WGAN6.7 ± 0.421.5 ± 1.655.2 ± 2.341.3 ± 2.08.32 ± 0.1
WGAN_GP20.3 ± 5.024.5 ± 2.155.8 ± 0.930.0 ± 1.00.761 ± 0.5
LSGAN7.8 ± 0.6 *30.7 ± 2.287.1 ± 47.553.9 ± 2.8 *4.25
MMGAN9.8 ± 0.929.6 ± 1.672.7 ± 3.665.6 ± 4.26.42 ± 1.0
NSGAN6.8 ± 0.526.5 ± 1.658.5 ± 1.955.0 ± 3.36.39 ± 0.9
DRAGAN7.6 ± 0.427.7 ± 1.269.8 ± 2.042.3 ± 3.010.75 ± 0.1
BEGAN13.1 ± 1.022.9 ± 0.971.4 ± 1.638.9 ± 0.90.32
VAE23.8 ± 0.658.7 ± 1.2155.7 ± 11.685.7 ± 3.80.13 ± 0.1
The asterisk (*) indicates the presence of significant outlier runs, usually severe mode collapses or training failures.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Abdulraheem, A.; Jung, I.Y. A Comparative Study of Engraved-Digit Data Augmentation by Generative Adversarial Networks. Sustainability 2022, 14, 12479. https://doi.org/10.3390/su141912479

AMA Style

Abdulraheem A, Jung IY. A Comparative Study of Engraved-Digit Data Augmentation by Generative Adversarial Networks. Sustainability. 2022; 14(19):12479. https://doi.org/10.3390/su141912479

Chicago/Turabian Style

Abdulraheem, Abdulkabir, and Im Y. Jung. 2022. "A Comparative Study of Engraved-Digit Data Augmentation by Generative Adversarial Networks" Sustainability 14, no. 19: 12479. https://doi.org/10.3390/su141912479

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop