Image Translation by Ad CycleGAN for COVID-19 X-Ray Images: A New Approach for Controllable GAN

Liang, Zhaohui; Huang, Jimmy Xiangji; Antani, Sameer

doi:10.3390/s22249628

Open AccessArticle

Image Translation by Ad CycleGAN for COVID-19 X-Ray Images: A New Approach for Controllable GAN

by

Zhaohui Liang

¹

,

Jimmy Xiangji Huang

^1,*

and

Sameer Antani

²

¹

Information Retrieval and Knowledge Management Laboratory, York University, Toronto, ON M3J 1P3, Canada

²

National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA

^*

Author to whom correspondence should be addressed.

Sensors 2022, 22(24), 9628; https://doi.org/10.3390/s22249628

Submission received: 31 October 2022 / Revised: 1 December 2022 / Accepted: 5 December 2022 / Published: 8 December 2022

(This article belongs to the Special Issue Digital Health in the COVID-19 Era: Lessons, Challenges and Opportunities)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

We propose a new generative model named adaptive cycle-consistent generative adversarial network, or Ad CycleGAN to perform image translation between normal and COVID-19 positive chest X-ray images. An independent pre-trained criterion is added to the conventional Cycle GAN architecture to exert adaptive control on image translation. The performance of Ad CycleGAN is compared with the Cycle GAN without the external criterion. The quality of the synthetic images is evaluated by quantitative metrics including Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Peak Signal-to-Noise Ratio (PSNR), Universal Image Quality Index (UIQI), visual information fidelity (VIF), Frechet Inception Distance (FID), and translation accuracy. The experimental results indicate that the synthetic images generated either by the Cycle GAN or by the Ad CycleGAN have lower MSE and RMSE, and higher scores in PSNR, UIQI, and VIF in homogenous image translation (i.e., Y → Y) compared to the heterogenous image translation process (i.e., X → Y). The synthetic images by Ad CycleGAN through the heterogeneous image translation have significantly higher FID score compared to Cycle GAN (p < 0.01). The image translation accuracy of Ad CycleGAN is higher than that of Cycle GAN when normal images are converted to COVID-19 positive images (p < 0.01). Therefore, we conclude that the Ad CycleGAN with the independent criterion can improve the accuracy of GAN image translation. The new architecture has more control on image synthesis and can help address the common class imbalance issue in machine learning methods and artificial intelligence applications with medical images.

Keywords:

generative adversarial networks; applied machine learning; X-ray images; digital health in the midst of COVID-19

1. Introduction

Coronavirus disease 2019 (COVID-19) is an infectious disease caused by a novel coronavirus SARS-CoV-2. The most common clinical manifestation of COVID-19 infection is a specific type of pneumonia which rapidly leads to severe acute respiratory infection symptoms and may even directly develop into acute respiratory distress syndrome (ARDS) [1]. Diagnostic methods for COVID-19 include new medical technologies from various domains. Although the gold standard for confirmation is the real-time reverse-transcriptase polymerase chain reaction (RT-PCR), the test sensitivity is about 96.0%, and its performance is affected by the disease prevalence in the given population [2]. The diagnosis, therefore, is a combination of RT-PCR test result with various clinically accessible methods such as contact history, physical examination, and radiographic imaging. The radiological diagnostic methods include imaging using computed tomography (CT), chest X-ray (CXR), and lung ultrasound (LUS), etc. While not a primary step, they still play important roles in confirming and staging positive cases. During the pandemic, many research groups have collected relevant medical images to develop new artificial intelligence (AI) technologies for automated COVID-19 screening and diagnosis [3], particularly applying deep neural networks (DNNs) for detecting image patterns consistent with the disease [4]. However, the effectiveness and generalizability of the methods are adversely impacted by the lack of sufficiently large number of adequately labeled COVID-19 images examples to build a balanced training set. A DNN trained using an imbalanced dataset, where cases only occupy 5% to 6% of the total image samples, will reach a performance threshold earlier than its theoretical capacity determined by the architecture [5]. A study published in 2020 revealed that the seemly high-performing DNN models for COVID-19 detection in CXR images are vulnerable to network attacks [6]. Another challenge is that the specific medical image patterns are different from general-purposed images such as those in the ImageNet dataset. When using transfer learning with DNN models trained with ImageNet to fine tune a new model for the radiography images, the pretrained feature extractors usually cannot effectively capture the medical significant patterns through the complex architecture but simply develop meaningless combinations for the final decision. All these factors contribute to the vulnerability of the current DNN technology for COVID-19 image pattern recognition and detection.

Image translation is a common task of image synthesis supported by generative adversarial networks (GAN) [7] and dual learning [8]. The objective of image translation is to learn the mapping between two image domains by dual learning with GAN. The current benchmark methods for image translation are the Pix2Pix for paired images learning [9] and cycle-consistent generative adversarial network (Cycle GAN) for unpaired images learning [10]. Image translation is widely used in medical image applications for cross-modality image synthesis with multiple purposes, such as image registration, data augmentation for improving model generalization capacity for image detection, classification, and segmentation, etc. Popular methods are based on the Cycle GAN architecture for image translation between medical images by different technologies [11,12,13]. The appearance of a radiologic image is like a gray-scale image, but different imaging mechanisms in fact result in the anatomical and pathological patterns being exhibited differently across computed tomography (CT), magnetic resonance (MR), and positron emission tomography (PET) images. With a deep neural network (DNN) architecture, GAN can learn the detailed mapping between two medical imaging pattern domains. With an optimized GAN, the digital images acquired from different methods (i.e., CT, MR, or PET) can be effectively converted to each other. This function helps the radiologists to maximize their performance to interpret the clinical findings without asking the patients to do all types of examinations. Furthermore, the new technology helps to lower the radiational dose for the examination to protect the patients while keeping the best diagnostic performance.

The most common application of medical images translation is to convert between CT and MR images. For example, Fu J. et al. introduced the sCTcycleGAN to convert MR images to CT images [14]. Lee J.H. et al. [15] and Hu N. et al. [16] used cGAN models to perform conversion of CT image to MR images to acquire more detailed information. Conversely, Nie D et al. [17] and Emami H et al. [18] used serialized GAN models to implement MR images to CT images conversions. Another type of image translation is to convert PET images to CT images, such as the study by Hu Z et al. using a WGAN model to perform attenuation correction and to convert PET to pseudo-CT images [19]. Bazangani F. et al. introduced an E-GAN for translating 3D FDG-PET image to MR image [20]. The main purpose of these image translation applications is to convert the radiologic images from a complex format to relatively simple format, such as from PET to MR, then from MR to CT, because the latter ones are easier to interpret by empirical medical expertise. However, one interesting topic of image translation is seldom involved, i.e., to perform the image domain translation from the normal domain to a certain disease domain. This idea is intuitive because a medical image with some morbid abnormality can be interpreted as: normal patterns + disease patterns. This type of application will make great contributions for rare or newly discovered diseases when the images containing the disease patterns are difficult to acquire while the images with the corresponding normal structure are accessible. A typical example is the COVID-19 radiological images acquired during the beginning of the pandemic. Thus, it became the principle of our experimental design.

The main contribution of this study is to introduce an external criterion to the current state-of-the-art GAN architecture (Cycle GAN) for image translation, which can ensure the generated synthetic images belonging to both the correct image domain and the correct diagnosis class. This design will be easy to extend to other medical or non-medical data synthetization applications. The rest of this paper is organized as follows. In Section 2 Materials and Methods, we present the rationale for cycle-consistent adversarial network (Cycle GAN) and its restrictions which can be solved by the new adaptive Cycle GAN (Ad CycleGAN) architecture. The pseudo code for the Ad CycleGAN optimization and the quantitative evaluation metrics are also presented in this section. In Section 3 Experiments, we present the Ad CycleGAN experiment based on the open-source COVID-19 image dataset and its performance compared with the conventional Cycle GAN. In Section 4 Discussion and Section 5 Conclusions, we combine the findings of the experiments in Section 3 with the Ad CycleGAN design discussed in Section 2 to summarize the merits of new Ad CycleGAN and explore its potential applications in biomedicine.

2. Materials and Methods

2.1. Cycle-Consistent Adversarial Network and Its Restriction

Cycle-consistent adversarial network, or Cycle GAN, is the state-of-the-art conditional generative adversarial network (CGAN) for unpaired image to image translation. A typical Cycle GAN uses two generators and two discriminators to learn the mapping of two distributions by optimizing with a complex objective and reaching a state of adversarial equilibrium. During optimization, the objective of the Cycle GAN has three components: adversarial loss, cycle consistency loss, and identity loss. The adversarial loss follows the original GAN design to measure the difference of the generated images and the target images. To mapping between two image distribution X and Y with Cycle GAN, we use two pairs of generators and discriminators in the Cycle GAN model. The first pair

G

and

D_{Y}

, aims to adversarially generate and distinguish the images belong to domain X and the images belonging to domain Y, i.e.,

m i n_{G} m a x_{D_{Y}} L_{G A N} (G, D_{Y}, X, Y)

. The optimization objective is written as:

L_{G A N} (G, D_{Y}, X, Y) = E_{y ~ p_{d a t a} (y)} [\log D_{Y} (y)] + E_{x ~ p_{d a t a} (x)} [\log (1 - D_{Y} (G (x)))]

(1)

where the data distribution of X and Y are denoted as

x ~ p_{d a t a} (x)

, and

y ~ p_{d a t a} (y)

. On the other hand, the second pair

F

and

D_{X}

, aims to adversarially generate and distinguish the generated and real images belonging to domain Y, i.e.,

m i n_{F} m a x_{D_{X}} L_{G A N} (F, D_{X}, Y, X)

, the corresponding objective is writing as:

L_{G A N} (F, D_{X}, Y, X) = E_{x ~ p_{d a t a} (x)} [\log D_{X} (x)] + E_{y ~ p_{d a t a} (y)} [\log (1 - D_{X} (F (y)))]

(2)

The total adversarial loss during a single iteration is the summation of the loss from Equations (1) and (2), i.e.,

L_{G A N} (G, D_{Y}, X, Y) + L_{G A N} (F, D_{X}, Y, X) .

The adversarial optimization can theoretically learn the mappings of G and F to produce identical images. However, this ideal outcome is unrealistic in two ways. First, given the ideal situation, the generator networks can randomly map an input image from the source domain to a random image in the target domain, which is not our desired outcome. Second, we need to guarantee that the generated images have valid shapes and other uncommon elements within a reasonable scope of the real images. We add the cycle-consistent losses to reversely translate the images back to their original domains, i.e.,

x \to G (x) \to F (G (x)) \approx x

(forward cycle consistency), and

y \to F (y) \to G (F (y)) \approx y

(backward cycle consistency). The total cycle-consistent loss is written as:

L_{c y c} (G, F) = E_{x ~ p_{d a t a} (x)} [F (G (x)) - x_{1}] + E_{y ~ p_{d a t a} (y)} [G (F (y)) - y_{1}]

(3)

Furthermore, an identity loss is added to measure how close the generated image to the real image itself if the real image goes through the Cycle GAN generator, i.e.,

x \to F (x) \approx x

, and

y \to G (y) \approx y

. Adding to identity loss to the total loss of the generator can help to preserve the original color. The identity loss can be expressed as:

L_{i d e n} (G, F) = E_{x ~ p_{d a t a} (x)} [F (x) - x_{1}] + E_{y ~ p_{d a t a} (y)} [G (y) - y_{1}]

(4)

The full generator loss of the Cycle GAN is written as the summation of the above three loss functions:

L_{t o t a l} = L_{G A N} (G, D_{Y}, X, Y) + L_{G A N} (F, D_{X}, Y, X) + λ L_{c y c} (G, F) + σ L_{i d e n} (G, F)

(5)

where

λ

and

σ

are the parameters to respectively adjust the importance of the cycle-consistent loss, and the identity loss during the model optimization. Thus, the objective for the whole model optimization is to solve:

G^{*}, F^{*} = \arg m i n_{G, F} m a x_{D_{X}, D_{Y}} L (G, F, D_{X}, D_{Y})

(6)

In our experiments, we find that though the Cycle GAN produces visually plausible synthetic images, it cannot guarantee the synthetic images to be classified to the correct category by an independently optimized DNN model. In general, the synthesized medical images by Cycle GAN have three drawbacks. First, the generator cannot produce images with high complexity due to the imbalanced information of the two image domains. Second, the translation from a domain with rich information to another domain with relatively poor information (e.g., translating MR images to CT images) is likely to cause ambiguity mapping, which means there can be multiple alternatives in the target domain corresponding to the identical input in the source domain. Third, the Cycle GAN is easily diverted by some improper constrains due to its unpaired image translation setting. The random encoded latent information not only distracts the model from ideal image translation, but even makes the model sensitive to disturbances and variations of input samples [12].

To solve the above problems, we propose the Adaptive Cycle GAN model, or Ad CycleGAN with external criterion to reduce the negative influence of random noise during the Cycle GAN optimization. We believe this new design can effectively improve both the quality of the synthetic images and the accuracy of the synthetic images to the target domain through the image translation process.

2.2. External Criterion in Cycle GAN Optimization

The term criterion originates from the concept of critics introduced by Arjovsky M et al. for the optimization of their Wasserstein GAN (WGAN) model [21]. Unlike the original GAN by Goodfellow I. et al. [7] using the discriminator to estimate the likelihood whether the synthetic images by the generator to be true, the WGAN uses the discriminator as a critic to evaluate the quality of the generated images against the real ones. We extend this idea by adding a pre-trained DNN as an independent criterion to evaluate the generated images from multiple aspects except simply classifying as real or fake. The errors from different criteria can be finally congregated as the criterion loss as a new component of the total generator loss. In the Ad CycleGAN model, we introduce two loss terms: the cycle criterion loss and the identity criterion loss. Both are estimated by a pretrained residual network for the likelihood of the synthetic images to the correct category. The joint criterion loss is written as:

L_{c} = L_{c - c y c l e} + L_{c - i d e n t i t y}

(7)

where the cycle criterion loss is to measure the similarity of

x ~ F (G (x))

and of

y ~ G (F (y))

, and the identity criterion loss is to measure the similarity of

x ~ F (x)

and

y ~ G (y)

. In other words, like the cycle loss and identity loss, the cycle criterion loss

L_{c - c y c l e}

quantitatively measures whether of the back-translated images are still be classified as the original class. The identity criterion loss

L_{c - i d e n t i t y}

quantitatively measures whether the trained generators can produce real images from a real observed sample that still consistent to the same class.

In addition, when the GAN training reaches an adversarial equilibrium, the criterion loss can periodically add an extra oscillation momentum to the stable condition to push the generator progress to learn more details. The new

L_{c}

term is considered as a regularization method to prevent the saturated status of the GAN optimization because it provides a method to make the GAN training controllable to a certain degree. However, we need to add an empirical decay factor to the criterion loss term to control its side effect of breaking the adversarial equilibrium leading the GAN model to learn the loss patterns again through more iterations.

2.3. Ad CycleGAN Architecture

The Ad CycleGAN consists of two pairs of generators and discriminators to learning the mapping between the image domain, and a pre-trained independent criterion to ensure the generated images containing the key discriminative patterns for the two image domains. The total loss function of the generators in the Ad CycleGAN consists of four parts:

Adversarial loss: $L_{G A N} (G, D_{Y}, X, Y) + L_{G A N} (F, D_{X}, Y, X)$
Cycle consistency loss: $L_{c y c l e} (G, F)$
Identity loss: $L_{i d e n} (G, F)$
Criterion Loss: $L_{c} = L_{c - c y c l e} + L_{c - i d e n t i t y}$
Thus, the total generator loss in Equation (5) is revised as:

$L_{t o t a l} = [L_{G A N} (G, D_{Y}, X, Y) + L_{G A N} (F, D_{X}, Y, X)] + λ L_{c y c l e} (G, F) + λ L_{i d e n} (G, F) + κ (φ L_{c})$

(8)

In the experiment for the COVID-19 X-ray image synthesis, the generators follow the U-Net architecture [22] with skip connections to reduce the input feature size from 64 by 64 to 1 by 1 then restore to 64 by 64. The discriminators follow the PatchGAN architecture [9] with an output of 4-by-4-by-1 feature map (given the low resolution of our dataset) to determine with the images are real or fake. We choose to use the binary cross entropy as the objective function for the discriminator loss and the adversarial loss terms for the generators. The cycle consistency loss and the identity loss use the mean of absolute error (MAE) function as the objective. The external criterion is a residual network with three residual modules. Each residual module has three convolutional layers with a skip connection from the first convolutional layers to the third one to ensure gradient flow when optimized by backpropagation. The residual modules are connected by batch normalization and max pooling layers to accordingly reduce the tensor size. It uses the sparse categorical cross entropy function as the loss objective, and it is optimized by the adaptive moment estimation (Adam) algorithm with the initial learning rate of

1 \times 10^{- 4}

with 100 epochs. During the GAN optimization, the pretrained criterion estimates the input images with the output logits which can be combined to other loss terms. The terms of the criterion loss are measured by the sparse categorical cross entropy as the same method as how the pretrained criterion was optimized. Though some studies recommend using the unbounded smooth loss function such as to optimize the GAN models such as Wasserstein Loss or Mean Square Error (MSE) [21,23]. Empirically, the choice of loss functions is mainly based on the components of the total loss objective. If all errors can be measured within similar scales, using the unbounded loss functions is straightforward and easier for the overall GAN optimization. However, if the GAN architecture consists of many components like this case, using hypermeters to adjust the importance of different terms or to determine the frequency of loss injection to the total loss can provide a more flexible option for GAN optimization as described in Equation (8). The Ad Cycle GAN architecture is illustrated in Figure 1. Additionally, the pseudo code of the optimization algorithm for the Ad CycleGAN model is presented in Algorithm 1.

Algorithm 1. Ad CycleGAN Optimization
1:	for number of epochs do
2:	for number of batches do
3:	Sample minibatch $\leftarrow {\{x^{(i)}\}}_{i = 1}^{m} \in X$
4:	Sample minibatch $\leftarrow {\{y^{(j)}\}}_{j = 1}^{m} \in Y$
5:	Generate m synthetic samples of $G (x) a n d F (y)$
6:	$s y n t h e t i c X : X \to G (x)$
7:	$s y n t h e t i c Y : Y \to F (y)$
8:	Compute the Adversarial loss
9:	$L_{G A N} (G, D_{Y}, X, Y) = E_{y ~ p (y^{(m)})} [\log D_{Y} (y)] + E_{x ~ p (x^{(m)})} [\log (1 - D_{Y} (G (x)))]$
10:	$L_{G A N} (F, D_{X}, Y, X) = E_{x ~ p_{d a t a} (x)} [\log D_{X} (x)] + E_{y ~ p_{d a t a} (y)} [\log (1 - D_{X} (F (y)))]$
11:	Generate m cycle sample of $F (G (x)) a n d G (F (y))$
12:	$C y c l e X : G (x) \to F (G (x))$
13:	$C y c l e Y : F (y) \to G (F (y))$
14:	Compute the Cycle loss
15:	$L_{c y c} (G, F) = E_{x ~ p (x^{(m)})} [F (G (x)) - x_{1}] + E_{y ~ p (y^{(m)})} [G (F (y)) - y_{1}]$
16:	Generate m identical sample of $F (x) a n d G (y)$
17:	$i d e n t i c a l X : X \to F (x)$
18:	$i d e n t i c a l Y : Y \to G (y)$
19:	Compute the identity loss
20:	$L_{i d e n} (G, F) = E_{x ~ p (x^{(m)})} [F (x) - x_{1}] + E_{y ~ p (y^{(m)})} [G (y) - y_{1}]$
21:	Compute the criterion loss for cycle sample: $L_{c - c y c l e}$
22:	Compute the criterion loss for identical sample: $L_{c - i d e n t i t y}$
23:	Compute the total generator loss
24:	$L_{t o t a l} = [L_{G A N} (G, D_{Y}, X, Y) + L_{G A N} (F, D_{X}, Y, X)] + λ L_{c y c l e} (G, F) + λ L_{i d e n} (G, F) + κ (φ L_{c})$
25	Update the Discriminator $D_{X} a n d D_{Y}$
26:	$m a x_{D_{X}} L_{G A N} (F, D_{X}, X, Y)$
27:	$m a x_{D_{Y}} L_{G A N} (G, D_{y}, X, Y)$
28:	Update the Generators $G, F$
29:	$m i n_{G, F} L (G, F, D_{X}, D_{Y})$
30:	end do
31:	end do

2.4. Evaluation Metrics for Translated Images

The performance evaluation of GAN networks is usually subjective and remains as an open problem [24]. Our objective is to generate synthetic medical images with good fidelity and diversity. We need to measure both the quality of the images and ensure the generated images belonging to the correct category, i.e., carrying the diagnostically significant patterns. The latter task can be measured by the classification accuracy of the synthetic images. There are generally two types of methods to measure the quality of the synthetic images: subjective evaluation and objective evaluation. Subjective evaluation requires human expertise. It is time consuming and difficult to replication. Therefore, we apply the objective metrics to compare the synthetic images and the generated images with the assumption that the high-quality synthetic images have higher degrees of the similarity to the real images. The quantitative evaluation metrics for our experiments include Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Peak Signal-to-Noise Ratio (PSNR), Universal Image Quality Index (UIQI), and Visual Information Fidelity (VIF).
MSE, RMSE and PSNR are metrics to measure the pixel difference between the synthetic images and the real images. MSE is the accumulated mean squared error of two images, and RMSE is the accumulated root mean square error of the two images. PSNR is a measure for image quality [25] based on the pixel difference between the synthetic image and the real image. UIQI summarizes the attributes of human vision [26], where synthetic images and the real images are compared in three aspects: luminance, contrast, and structure. VIF is another measure based on human visual perception. VIF quantifies the image fidelity by the difference of the information extracted from the real image and the information loss to the synthetic image by human brain is quantified as the VIF score using visual natural scene statistics (NSS), human visual system (HVS) and an image distortion model. For comparison, the synthetic images with low MSE and RMSE, and with high scores in PSNR, UIQI and VIF are considered to be of better quality. In addition, we also use the Frechet Inception Distance (FID) which is a commonly accepted metric to compare the quality of the images synthesized by different generative models. FID was proposed by Heusel, M. et al. in 2017 to calculate the distance between feature vectors calculated for real and generated images [27]. It reflects how similar the two image groups are in terms of statistics on computer vision features of the raw images calculated using a pretrained classifier. Low FID score indicates the two groups of images are similar or have more similar statistics.
In the next section, we will present the experiments of respectively using Cycle GAN and Ad CycleGAN to perform image translation between normal CXR images and COVID-19 positive CXR images, and the comparisons of the quality of the synthetic with the above quantitative metrics.

3. Experiments

3.1. Material and Methods

In our experiments, we respectively implemented the Cycle GAN and the Ad CycleGAN to perform image translation between normal and COVID-19 positive CXR images. According to the clinical observation, the COVID-19 positive cases have special bilateral or unilateral multiple mottling and ground-glass opacity patterns on the CXR and CT images [1] and these patterns have been successfully captured by multiple DNN models [3,4,5]. Based on the above discussion, the image translation between normal and COVID-19 positive images can be formulated as learning the mapping to add or to remove such diagnostic significant patterns between the normal and the COVID-19 image domains. We use an image dataset consisting of 219 COVID-19 positive images, and 1064 normal CXR from the Kaggle COVID-19 Radiography dataset (https://www.kaggle.com/datasets/tawsifurrahman/covid19-radiography-database, accessed on 29 August 2022) for the experiments. Figure 2 illustrates some example images in the dataset.

Given the hardware condition, the images are resized to 64-by-64, 3 channel as the input dimensions. Because we have only 219 real COVID-19 X-ray images, 50 of the COVID-19 images are randomly selected and withheld for testing, the rest 169 real images are duplicated 6 times to match 1014 normal X-ray images for model optimization. The Cycle GAN and Ad CycleGAN models are respectively optimized by 600 epochs on the Google Colab platform with GPU. The average runtime is about 58 s per epoch with the mini-batch size of 64.

3.2. Results and Interpretation

As shown in Equation (8), we respectively optimized the Cycle GAN and the Ad CycleGAN with similar parameter configurations, with

λ = 80.0

,

σ =

60.0,

φ = 0.1

, and

κ = 20

. It means that in the optimization of Ad CycleGAN, the criterion loss is added to the total generator loss term very 20 steps. The models are optimized by the Adam optimizer with the initial learning rate of

2 \times 10^{- 4}

for 600 epochs. The mini-batch size is 64. The synthetic CXR images respectively generated by the Cycle GAN and by the Ad CycleGAN are shown in Figure 3 and Figure 4. Note the both Cycle GAN and Ad Cycle GAN can perform heterogenous translation, which means the normal images are translated to COVID-19 positive images and vice versa (i.e.,

X \to Y

or

Y \to X

); and they can also perform homogeneous translation, which converts the input images within the same domain (i.e.,

X \to X

or

Y \to Y

). The homogeneous translation is considered as image augmentation by GAN.

The quantitative measures for the quality of the synthetic images are listed in Table 1, the FID score, and the classification accuracy to the due category of the synthetic images are listed in Table 2.

From Figure 3 and Figure 4, we observe that both Cycle GAN and Ad CycleGAN can synthesize high quality COVID-19 X-ray images with good visual fidelity and diversity through the image translation. Another finding is that both Cycle GAN and Ad CycleGAN can not only perform image translation, but also convert the input images aligned on the sagittal axis to the synthetic images aligned on the coronal axis. The quantitative metrics indicate that the synthetic images generated either by the Cycle GAN or by the Ad CycleGAN have lower MSE and RMSE, and higher scores in PSNR, UIQI, and VIF through the image augmentation process (i.e.,

Y \to Y

) compared to the image translation process (i.e.,

X \to Y

). It implies the GANs cannot translate high quality synthetic images probably due to insufficient training samples.

The synthetic images by Ad CycleGAN through the heterogeneous image translation (i.e.,

X \to Y

) have significantly higher FID score compared to Cycle GAN (

p < 0.01

). However, Cycle GAN generates comparable or even slightly better images through the homogeneous translation or augmentation process (i.e.,

Y \to Y

). The image translation accuracy of Ad CycleGAN is higher than the Cycle GAN when the normal images are converted to COVID-19 positive images (

p < 0.01

). However, both Ad CycleGAN and Cycle GAN can perfectly perform homogeneous translation with the accuracy of 100%., i.e., augmenting the image diversity within the COVID-19 positive image domain. It implies that the independent criterion in the Ad CycleGAN can improve the accuracy for heterogeneous image translation.

In our literature review, the similar research on COVID-19 CXR image synthesis mainly uses accuracy as the main metric for GAN performance. For example, Motamed S et al. used GAN as data augmentation to improve the COVID-19 CXR image classification accuracy from 0.81 to 0.84 [28]. Morís DI et al. used CycleGAN to improve COVID-19 CXR image screening accuracy to about 0.90 [29]. Both studies did not use GAN with effective controllable mechanism to guarantee the synthetic images falling into the COVID-19 positive category.

When observing the loss change through the Ad CycleGAN optimization process in Figure 5, we find the influence of the independent periodic criterion loss (added to the total loss every 20 steps) to the total GAN optimization. The criterion loss impacts on both image translation directions at the beginning of the GAN optimization in the early 200 epochs, then it becomes stable afterwards. However, the total loss for discriminator Y drops approximately at the last 50 epochs. It implies the external criterion exerts more impact on synthesized COVID-19 positive images, which explains why the image translation by Ad CycleGAN has higher accuracy than that by Cycle GAN.

4. Discussion

We present the study of Ad CycleGAN for image translation between normal chest X-ray images and COVID-19 positive chest X-ray images. The experiment compared the performance of new Ad CycleGAN with the conventional Cycle GAN with a series of quantitative metrics of image quality, the FID score, and image translation accuracy. The results indicate that Ad CycleGAN generates synthetic COVID-19 images with higher accuracy than those generated by Cycle GAN. When performing heterogenous image translation, the Ad CycleGAN can generate synthetic COVID-19 positive images with higher FID score and accuracy. Therefore, we conclude the adaptive external criterion design of the Ad CycleGAN can effectively control the image category for image translation. The Ad CycleGAN is considered as a new approach of conditional GAN which can extend the control power upon the synthetic image domain for GAN image synthesis and translation.

The proposed Ad CycleGAN in this study follows the GAN optimization strategy originated from the Wasserstein GAN (WGAN) [21], where the objective is not to estimate the probability of the synthetic images considered as real images, but to “rate” the synthetic images as an objective critic. Under this framework, the GAN optimization process can be combined with the opinions by multiple critics from different aspects, thus the zero-sum adversarial game rule proposed by Goodfellow et al. [7] has been changed to a multi-domain task. Furthermore, the Ad CycleGAN does not need to encode the labels into the training data therefore it simplifies the computation runtime for GAN optimizations.

The most significant merit of the Ad CycleGAN is that it improves the image translation accuracy to the target image category. In our literature review, most of the applications of GAN is to use this generative model for image data augmentation, but there is no guarantee of the generated images falling into the correct category. Therefore, the introduction of the pre-trained independent criterion becomes a unique impact of the Ad CycleGAN to the GAN architecture.

In addition, Ad CycleGAN can perform both image augmentation and image translation. Image augmentation means the input real images belongs to the same category as the expect synthetic outputs, e.g., from normal images to normal images with acceptable diversity, or from disease positive images to disease positive images with acceptable diversity. Currently, most of the GAN studies on medical images are focusing on image augmentation. The GAN models generate multiple synthetic samples including synthetic images for direct DNN optimization, image mask for improving image segmentation, and image feature maps for medical diagnosis and decision making. The applications of image translation are mainly for converting the images from one format to another format, such as from MR images to CT images, and from ultrasound images to CT images. However, all the available applications have not explored the task of converting images from the normal or healthy domain to a specific disease domain, which is crucial in medical research. Our experiment proves that the new Ad CycleGAN performs higher accurate heterogenous image translation (i.e.,

X \to Y

) than the original Cycle GAN model.

The Ad CycleGAN architecture provides the flexibility to add more external critics to control multiple aspects of GAN based image synthesis. As Khaldi Y et al. stressed the importance of image color control by GAN [30] and Creswell A et a. emphasized the proper image domain mapping for GAN image generation [31], the trade-off between model complexity and controllability is one of the main considerations for GAN based image generative models.

5. Conclusions

The findings in the experiments indicates that the newly proposed Ad CycleGAN can perform accurate medical image translation. We hope that this unique feature is helpful to solve the common class imbalance issues because the medical images containing rare or new disease information are both difficult to acquire and expensive for expert annotation. The successful applications of GAN for COVID-19 pattern detection and segmentation shows the feasibility to widely use DNN for computer assisted disease diagnosis and public health management. It also provides a new approach for rapid deployment of AI solutions for portable and wearable devices for COVID-19 or other public health challenges.

The future work on Ad CycleGAN will focus on two aspects. First, we will further improve the optimization objective function to ensure more control on the optimization process and minimize the side effects on the external criterion from synthesizing high-quality images like reducing the occurrence of artifacts on the synthetic images. More extra criterion can also be added to the objective to further control the characteristics of the generated images to the due domain. Second, we can develop a more sophisticated GAN architecture to extend the Ad CycleGAN design for more tasks. For example, the Ad CycleGAN can be optimized with the Pix2Pix architecture to precisely allocate the location of the synthetic patterns. Therefore, the application of the Ad CycleGAN can be extended to medical image segmentation. In conclusion, GAN provides a promising solution for the data greedy feature of deep neural networks. The new Ad CycleGAN provides more authentic images to augment the training of DNN with high performance and robustness. We believe this new technology will promote DNN-related technologies for medical diagnosis and decision-making, and it will ultimately help to enhance high-quality healthcare delivery.

Author Contributions

Programming, experiment, and writing—original draft preparation, Z.L.; supervision, J.X.H.; writing—review and editing, S.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the Canada NSERC CREATE award in ADERSIM. Antani was supported by the Intramural Research Program of the National Library of Medicine, part of the National Institutes of Health.

Informed Consent Statement

Not applicable.

Data Availability Statement

The COVID-19 CXR image dataset in this study is from an open source COVID-19 Radiography Database on Kaggle, which is accessible at: https://www.kaggle.com/datasets/tawsifurrahman/covid19-radiography-database (accessed on 29 August 2022).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

Ad CycleGAN	adaptive cycle-consistent generative adversarial network
AI	artificial intelligence
ARDS	acute respiratory distress syndrome
CGAN	conditional GAN
COVID-19	Coronavirus disease 2019
Cycle GAN	Cycle-consistent generative adversarial network
CT	computed tomography
CXR	chest X-ray
DNN	deep neural network
FID	Frechet inception distance
GAN	generative adversarial network
HVS	human visual system
LUS	lung ultrasound
MR	magnetic resonance
MSE	mean squared error
NSS	natural scene statistics
PET	positron emission tomography
PSNR	peak signal-to-noise ratio
RMSE	root mean squared error
RT-PCR	real-time reverse-transcriptase polymerase chain reaction
UIQI	universal image quality index
VIF	visual information fidelity
WGAN	Wasserstein GAN

References

Chen, N.; Zhou, M.; Dong, X.; Qu, J.; Gong, F.; Han, Y.; Qiu, Y.; Wang, J.; Liu, Y.; Wei, Y.; et al. Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: A descriptive study. Lancet 2020, 395, 507–513. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Deeks, J.J.; Dinnes, J.; Takwoingi, Y.; Davenport, C.; Spijker, R.; Taylor-Phillips, S.; Adriano, A.; Beese, S.; Dretzke, J.; di Ruffano, L.F.; et al. Antibody tests for identification of current and past infection with SARS-CoV-2. Cochrane Database Syst. Rev. 2020, 6, CD013652. [Google Scholar] [CrossRef]
Shen, D.; Gao, Y.; Munoz-Barrutia, A.; Debuc, D.C.; Percannella, G. Special Issue on Imaging-Based Diagnosis of COVID-19. IEEE Trans. Med. Imaging 2020, 39, 2569–2571. [Google Scholar] [CrossRef]
Sufian, A.; Ghosh, A.; Sadiq, A.S.; Smarandache, F. A Survey on Deep Transfer Learning and Edge Computing for Mitigating the COVID-19 Pandemic. J. Syst. Archit. 2020, 108, 101830. [Google Scholar] [CrossRef]
Minaee, S.; Kafieh, R.; Sonka, M.; Yazdani, S.; Jamalipour Soufi, G. Deep-COVID: Predicting COVID-19 from chest X-ray images using deep transfer learning. Med. Image. Anal. 2020, 65, 101794. [Google Scholar] [CrossRef]
Hirano, H.; Koga, K.; Takemoto, K. Vulnerability of deep neural networks for detecting COVID-19 cases from chest X-ray images to universal adversarial attacks. PLoS ONE 2020, 15, e0243963. [Google Scholar] [CrossRef]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 139–144. [Google Scholar] [CrossRef]
He, D.; Xia, Y.; Qin, T.; Wang, L.; Yu, N.; Liu, T.; Ma, W.T. Dual learning for machine translation. In Proceedings of the Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 820–828. [Google Scholar]
Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1125–1134. [Google Scholar] [CrossRef] [Green Version]
Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar] [CrossRef] [Green Version]
Cai, J.; Zhang, Z.; Cui, L.; Zheng, Y.; Yang, L. Towards cross-modal organ translation and segmentation: A cycle- and shape-consistent generative adversarial network. Med. Image Anal. 2019, 52, 174–184. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Tang, S.; Zhang, R.; Zhang, Y.; Li, J.; Yan, S. Asymmetric GAN for Unpaired Image-to-Image Translation. IEEE Trans. Image Process. 2019, 28, 5881–5896. [Google Scholar] [CrossRef] [Green Version]
Siddiquee, M.M.R.; Zhou, Z.; Tajbakhsh, N.; Feng, R.; Gotway, M.B.; Bengio, Y.; Liang, J. Learning Fixed Points in Generative Adversarial Networks: From Image-to-Image Translation to Disease Detection and Localization. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 191–200. [Google Scholar] [CrossRef] [Green Version]
Fu, J.; Singhrao, K.; Cao, M.; Yu, V.; Santhanam, A.P.; Yang, Y.; Guo, M.; Raldow, A.C.; Ruan, D.; Lewis, J.H. Generation of abdominal synthetic CTs from 0.35T MR images using generative adversarial networks for MR-only liver radiotherapy. Biomed. Phys. Eng. Express 2020, 6, 015033. [Google Scholar] [CrossRef] [Green Version]
Lee, J.H.; Han, I.H.; Kim, D.H.; Yu, S.; Lee, I.S.; Song, Y.S.; Joo, S.; Jin, C.; Kim, H. Spine Computed Tomography to Magnetic Resonance Image Synthesis Using Generative Adversarial Networks: A Preliminary Study. J. Korean Neurosurg. Soc. 2020, 63, 386–396. [Google Scholar] [CrossRef] [Green Version]
Hu, N.; Zhang, T.; Wu, Y.; Tang, B.; Li, M.; Song, B.; Gong, Q.; Wu, M.; Gu, S.; Lui, S. Detecting brain lesions in suspected acute ischemic stroke with CT-based synthetic MRI using generative adversarial networks. Ann. Transl. Med. 2022, 10, 35. [Google Scholar] [CrossRef]
Nie, D.; Trullo, R.; Lian, J.; Petitjean, C.; Ruan, S.; Wang, Q.; Shen, D. Medical Image Synthesis with Context-Aware Generative Adversarial Networks. Int. Conf. Med. Image Comput. Comput.-Assist. Interv. 2017, 10435, 417–425. [Google Scholar] [CrossRef]
Emami, H.; Dong, M.; Nejad-Davarani, S.P.; Glide-Hurst, C.K. Generating synthetic CTs from magnetic resonance images using generative adversarial networks. Med. Phys. 2018, 45, 3627–3636. [Google Scholar] [CrossRef] [PubMed]
Hu, Z.; Li, Y.; Zou, S.; Xue, H.; Sang, Z.; Liu, X.; Yang, Y.; Zhu, X.; Liang, D.; Zheng, H. Obtaining PET/CT images from non-attenuation corrected PET images in a single PET system using Wasserstein generative adversarial networks. Phys. Med. Biol. 2020, 65, 215010. [Google Scholar] [CrossRef]
Bazangani, F.; Richard, F.J.P.; Ghattas, B.; Guedj, E. Alzheimer’s Disease Neuroimaging I. FDG-PET to T1 Weighted MRI Translation with 3D Elicit Generative Adversarial Network (E-GAN). Sensors 2022, 22, 4640. [Google Scholar] [CrossRef] [PubMed]
Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, NSW, Australia, 6–11 August 2017; pp. 214–223. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. Int. Conf. Med. Image Comput. Comput.-Assist. Interv. 2015, 234–241. [Google Scholar] [CrossRef]
Li, M.; Huang, H.; Ma, L.; Liu, W.; Zhang, T.; Jiang, Y. Unsupervised image-to-image translation with stacked cycle-consistent adversarial networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 184–199. [Google Scholar] [CrossRef]
Salimans, T.; Goodfellow, I.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X. Improved techniques for training gans. Adv. Neural Inf. Process. Syst. 2016, 29, 2234–2242. [Google Scholar]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [Green Version]
Wang, Z.; Bovik, A.C. A universal image quality index. IEEE Signal Process. Lett. 2002, 9, 81–84. [Google Scholar] [CrossRef]
Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar] [CrossRef]
Motamed, S.; Rogalla, P.; Khalvati, F. Data augmentation using Generative Adversarial Networks (GANs) for GAN-based detection of Pneumonia and COVID-19 in chest X-ray images. Inf. Med Unlocked 2021, 27, 100779. [Google Scholar] [CrossRef]
Morís, D.I.; de Moura Ramos, J.J.; Buján, J.N.; Hortas, M.O. Data augmentation approaches using cycle-consistent adversarial networks for improving COVID-19 screening in portable chest X-ray images. Expert Syst. Appl. 2021, 15, 115681. [Google Scholar] [CrossRef] [PubMed]
Khaldi, Y.; Benzaoui, A. A new framework for grayscale ear images recognition using generative adversarial networks under unconstrained conditions. Evol. Syst. 2021, 12, 923–934. [Google Scholar] [CrossRef]
Creswell, A.; Bharath, A.A. Inverting the generator of a generative adversarial network. IEEE Trans. Neural Netw. Learn. Syst. 2018, 30, 1967–1974. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Ad CycleGAN Architecture.

Figure 2. Original Chest X-ray Images.

Figure 3. Image translation by Cycle GAN.

Figure 4. Image translation by Ad CycleGAN.

Figure 5. Optimization process of Ad CycleGAN.

Table 1. Quantitative metrics for the synthetic Images.

Model (Translation Direction)	MSE $(s t d)$	RMSE $(s t d)$	PSNR $(s t d)$	UIQI $(s t d)$	VIF $(s t d)$
Cycle GAN ( $X \to Y$ )	3608.97 (1398.37)	58.77 (12.42)	12.97 (2.10)	0.81 (0.08)	0.12 (0.05)
Cycle GAN ( $Y \to Y$ )	409.00 (495.043)	17.55 (10.035)	24.43 (4.398)	0.97 (0.029)	0.55 (0.050)
Ad CycleGAN ( $X \to Y$ )	3750.71 (1789.51)	59.54 (14.30)	12.88 (2.12)	0.80 (0.13)	0.10 (0.03)
Ad CycleGAN ( $Y \to Y$ )	435.84 (461.59)	18.73 (9.21)	23.59 (3.86)	0.97 (0.03)	0.52 (0.05)

Table 2. FID score and classification accuracy.

Model (Translation Direction)	FID $(s t d)$	Accuracy
Cycle GAN $(X \to Y)$	$5.26 \times 10^{- 6}$ $(1.82 \times 10^{- 5})$	0.9375
Cycle GAN $(Y \to Y)$	$6.31 \times 10^{- 4}$ $(4.40 \times 10^{- 7})$	1.0
Ad CycleGAN $(X \to Y)$	$1.19 \times 10^{- 5}$ $(4.16 \times 10^{- 6})$	0.9843
Ad CycleGAN $(Y \to Y)$	$3.60 \times 10^{- 4}$ $(6.44 \times 10^{- 5})$	1.0

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liang, Z.; Huang, J.X.; Antani, S. Image Translation by Ad CycleGAN for COVID-19 X-Ray Images: A New Approach for Controllable GAN. Sensors 2022, 22, 9628. https://doi.org/10.3390/s22249628

AMA Style

Liang Z, Huang JX, Antani S. Image Translation by Ad CycleGAN for COVID-19 X-Ray Images: A New Approach for Controllable GAN. Sensors. 2022; 22(24):9628. https://doi.org/10.3390/s22249628

Chicago/Turabian Style

Liang, Zhaohui, Jimmy Xiangji Huang, and Sameer Antani. 2022. "Image Translation by Ad CycleGAN for COVID-19 X-Ray Images: A New Approach for Controllable GAN" Sensors 22, no. 24: 9628. https://doi.org/10.3390/s22249628

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Image Translation by Ad CycleGAN for COVID-19 X-Ray Images: A New Approach for Controllable GAN

Abstract

1. Introduction

2. Materials and Methods

2.1. Cycle-Consistent Adversarial Network and Its Restriction

2.2. External Criterion in Cycle GAN Optimization

2.3. Ad CycleGAN Architecture

2.4. Evaluation Metrics for Translated Images

3. Experiments

3.1. Material and Methods

3.2. Results and Interpretation

4. Discussion

5. Conclusions

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI