LRFID-Net: A Local-Region-Based Fake-Iris Detection Network for Fake Iris Images Synthesized by a Generative Adversarial Network

Kim, Jung Soo; Lee, Young Won; Hong, Jin Seong; Kim, Seung Gu; Batchuluun, Ganbayar; Park, Kang Ryoung

doi:10.3390/math11194160

Open AccessArticle

LRFID-Net: A Local-Region-Based Fake-Iris Detection Network for Fake Iris Images Synthesized by a Generative Adversarial Network

by

Jung Soo Kim

,

Young Won Lee

,

Jin Seong Hong

,

Seung Gu Kim

,

Ganbayar Batchuluun

and

Kang Ryoung Park

^*

Division of Electronics and Electrical Engineering, Dongguk University, 30 Pildong-ro, 1-gil, Jung-gu, Seoul 04620, Republic of Korea

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(19), 4160; https://doi.org/10.3390/math11194160

Submission received: 12 September 2023 / Revised: 26 September 2023 / Accepted: 1 October 2023 / Published: 3 October 2023

(This article belongs to the Special Issue New Advances and Applications in Image Processing and Computer Vision)

Download

Browse Figures

Versions Notes

Abstract

:

Iris recognition is a biometric method using the pattern of the iris seated between the pupil and the sclera for recognizing people. It is widely applied in various fields owing to its high accuracy in recognition and high security. A spoof detection method for discriminating a spoof attack is essential in biometric recognition systems that include iris recognition. However, previous studies have mainly investigated spoofing attack detection methods based on printed or photographed images, video replaying, artificial eyes, and patterned contact lenses fabricated using iris images from information leakage. On the other hand, there have only been a few studies on spoof attack detection using iris images generated through a generative adversarial network (GAN), which is a method that has drawn considerable research interest with the recent development of deep learning, and the enhancement of spoof detection accuracy by the methods proposed in previous research is limited. To address this problem, the possibility of an attack on a conventional iris recognition system with spoofed iris images generated using cycle-consistent adversarial networks (CycleGAN), which was the motivation of this study, was investigated. In addition, a local region-based fake-iris detection network (LRFID-Net) was developed. It provides a novel method for discriminating fake iris images by segmenting the iris image into three regions based on the iris region. Experimental results using two open databases, the Warsaw LiveDet-Iris-2017 and the Notre Dame Contact Lens Detection LiveDet-Iris-2017 datasets, showed that the average classification error rate of spoof detection by the proposed method was 0.03% for the Warsaw dataset and 0.11% for the Notre Dame Contact Lens Detection dataset. The results confirmed that the proposed method outperformed the state-of-the-art methods.

Keywords:

iris recognition; spoof attack detection; deep learning; generative adversarial network; LRFID-Net

MSC:

68T07; 68U10

1. Introduction

Recently, biometric systems have been employed in various fields requiring a high level of security, such as bank ATMs, cell phones, and airports. Various types of biometric systems have been developed, including face recognition, fingerprint recognition, finger-vein recognition, voice recognition, and iris recognition. Among these biometric methods, in the case of iris recognition, the false matching rate in recognition is low because of the unique iris patterns of each person and the different iris patterns of the left and right eyes. In addition, because eyelashes and eyelids protect the iris, it is less likely to be damaged than fingerprints and the face, indicating that using the iris as a biometric trait is highly secure; thus, iris recognition has been widely implemented in many different fields [1,2]. However, in the case of biometrics, there is a disadvantage that the biometric information cannot be changed if it is hacked owing to information leakage or replication. Because of this drawback, accurately distinguishing real iris images from fake images is critical.

Typical presentation attacks (PAs) of iris images include attacks using reprinted or photographed iris images, patterned contact lenses, or artificial eyes, and there has been considerable research on presentation attack detection (PAD) of these PAs [3,4,5]. Recent advances in deep learning techniques have made it possible to generate sophisticated images with details by generative adversarial networks (GANs). GAN-generated images show little difference from live images, and the images are detailed and elaborate, which makes discrimination of these images difficult. Early research mainly focused on the detection of fake images generated using the relativistic average standard generative adversarial network (RaSGAN) [6], which is one of the unconditional GANs. However, because unconditional GANs could not control which fake images should be generated, the method posed difficulties in generating fake images by replicating an iris of a specific person. To address this limitation, the conditional GAN-based method [7] was introduced to provide control over which fake images are generated, making it possible to generate the desired images. Figure 1 shows an example of original images and images generated using a cycle-consistent generative adversarial network (CycleGAN) [8], a type of conditional GAN. There is little difference between the generated fake iris images and live iris images.

As described above, it is possible to replicate the iris of a specific person with little difference from the live images; therefore, it is extremely important to distinguish the fake iris image generated by a conditional GAN. To address this problem, in this study, the possibility of attacking a conventional iris recognition system using spoofed iris images generated using a CycleGAN was examined and a method for discriminating fake iris images from real ones was developed. This research has the following four contributions.

In this study, to generate more high-resolution fake images than the fake iris images generated using RaSGANs in prior studies, a CycleGAN was used; Gaussian blur was used to remove the GAN fingerprint to make it possible to generate more high-quality fake iris images.
Unlike most previous studies that used the global iris region for PAD, a local-region-based fake-iris detection network (LRFID-Net) was developed that performs PAD based on a segmented local region with reference to the detected iris region.
In LRFID-Net, three feature maps are obtained using the iris region and the upper and lower eyelash regions as inputs to the dense block. By performing channel-wise concatenation of the feature maps thus obtained and using the results as inputs to the shallow model, classification into live or fake iris images is performed.
We release the proposed models with algorithms and synthetic iris images generated using the GAN via Github site [9] for objective performance evaluation.

The subsequent sections of this article are as in the following. The previous research and our iris PAD method are detailed in Section 2 and Section 3, respectively, and the experimental results are analyzed in Section 4. Lastly, the concluding remarks of our study are explained in Section 5.

2. Related Works

Studies on iris PAD in the literature can be divided into two categories: PAD using fabricated artifacts and PAD using generated fake images. The former is a method of detecting a presentation attack (spoof attack) using reprinted or photographed iris images, patterned contact lenses, or artificial eyes, and the latter is spoof attack detection using synthetic iris images generated by a GAN using iris images from leaked information. Section 2.1 and Section 2.2 provide detailed descriptions of the two PAD methods.

2.1. PAD Using Fabricated Artifacts

Previous studies on iris PAD have employed ocular images without preprocessing the images, used the detected iris region after geometric normalization, or applied the unnormalized iris image without preprocessing. Raghavendra et al. proposed ContlensNet, in which the detected iris region is normalized and the normalized iris image is used as an input to a deep convolutional neural network (D-CNN) [10]. In another study, He et al. proposed a multi-patch convolutional neural network (MCNN) in which geometrical normalization is performed with the detected iris image and the normalized iris image is decomposed into 28 patches. Then, the algorithm is trained. After training, classification is performed using logistic regression [11]. The methods of normalization for the iris region used in these previous studies have the advantage of using only the selected iris region of importance. The drawbacks are that the preprocessing is complicated and regions other than the iris texture region are not considered in the analysis. Considering these limitations, PAD methods without the normalization of the detected iris region were also investigated.

In previous studies based on iris region without normalization, Sharma et al. proposed dense network presentation attack detection (D-NetPAD) [12] using images obtained by cropping the iris region from the ocular image as inputs to the DenseNet. Hoffman et al. proposed a method with tessellation of the iris region image obtained from the ocular image into 25 overlapping patches and used the patches to learn features [13]. Pala et al. divided the iris region into 32 × 32 grayscale iris patches, obtaining 64 feature maps, and used the maps as inputs to TripletNet [14]. Choudhary et al. proposed a densely connected contact lens detection network (DCLNet) that used the ocular image as the input data for DenseNet-121 without preprocessing, extracting features from the second dense block, and performed classification using a support vector machine (SVM) [15]. In this method, DensetNet-121, which is based on DCLNet, uses pretrained weights obtained by training using ImageNet and freezes the 27 layers at the front end so that features can be extracted through pretrained weights. In addition, Choudhary et al. proposed an iris anti-spoofing method using score-level fusion based on features obtained from a densely connected contact lens classification network (DCCNet) that performed classification by adding all the feature maps extracted from dense blocks in DenseNet-121 using ocular images and additional handcrafted features [16]. Jaswal et al. proposed a dense feature calibration attention-guided network (DFCANet) in which features are extracted through DenseNet-121 and an iris feature calibration network (IFCNet) and channel attention module (CAM) are used [17]. In addition, a few previous studies have investigated PAD methods using normalized and unnormalized iris regions. Nguyen et al. proposed a method in which the inner and outer regions are divided from a detected iris region and normalization is performed for each region to carry out score concatenation or feature concatenation. Then, live and fake iris images are classified using an SVM [18]. In the method proposed by Chen et al., after detection of the iris region, three distinct binary codes obtained by Gabor filters are obtained from the normalized iris region without the pupil (IrisCodes), the normalized iris region with the pupil (Pupil-IrisCodes), and the unnormalized iris region (OcularCode). The PAD CNNs corresponding to these three binary codes are IC-PAD, Pupil-IC-PAD, and OC-PAD, and these were used to obtain output scores. Score-level fusion was performed for the obtained output scores to derive the final classification results between live and fake iris images [19].

Fang et al. [4] proposed a pixel-wise binary supervision network (PBS) and an attention-based pixel-wise binary supervision network (APBS) that take an ocular image as input, use two dense blocks to obtain a feature map and a binary mask, and then use them to utilize the softmax loss of the feature map and the smoothL1 loss between the feature map and the binary mask. Fang et al. introduced a method for PAD that involves fusing 2D textural features (OSPAD-2D) and 3D photometric stereo features (OSPAD-3D) [5]. However, these methods have limitations in effectively utilizing fine-grained information from different regions and the study did not consider fake iris images generated using GANs. However, our proposed LRFID-Net can use the combined features from three different regions: the iris, upper eyelashes, and lower eyelashes. Therefore, it can utilize more fine-grained local information compared with the above methods that use the entire eye image without any region distinction. In addition, compared with the above methods that use only the iris region our method has the advantage of using various features extracted from the upper and lower eyelash regions in addition to iris area. Furthermore, the proposed LRFID-Net also considers fake iris images generated using GAN and shows high classification performance.

2.2. PAD Using Generated Fake Image

Kohli et al. proposed the iris deep convolutional generative adversarial network (iDCGAN) [20]. In this method, as a result of evaluating PAD performance for detection of fake iris images generated using iDCGAN, the accuracy of detection was lower than that obtained by using synthetic iris images generated by conventional methods, confirming that the fake iris images generated using iDCGAN were more difficult to distinguish from the real iris images compared with the ones generated by conventional methods. For detection of these fake iris images, Yadav et al. proposed multilevel Haralick and VGG fusion (MHVF) [21]. In other research by these authors, fake iris images were generated using RaSGAN and classification performance was evaluated using BSIF, VGG-16, DESIST, and Iris-TLPAD [22]. The authors proposed relativistic discriminator PAD (RD-PAD) using RaSGAN’s relativistic discriminator for discriminating the generated fake iris images as a single-class classifier [23]. However, in this method, if the number of training samples for the PA class is insufficient, it may negatively affect the PAD performance. To overcome this limitation, the authors attempted to improve the performance using augmentation for fake training data with insufficient numbers of samples. They proposed the cyclic image translation generative adversarial network (CIT-GAN), with a styling network to generate high-quality fake-iris data for multiple domains [24]. In other research, an attention module was used to detect fake images generated through RaSGAN and those generated with conventional methods. Chen et al. proposed a PAD method in which a position attention module and a CAM [25] are applied. In addition, Zou et al. generated fake iris images based on 4DCycleGAN using four discriminators with CycleGAN as a backbone, and they improved the classification performance using linear coding for image classification, i.e., locality-constrained linear coding (LLC) [26]. Table 1 shows the comparisons of existing iris PAD methods and our iris PAD method.

3. Proposed Method

3.1. Overall Process of Proposed Method

Figure 2 presents a flowchart of our iris PAD scheme. First, the iris region is detected from the input ocular image (step 2 of Figure 2) and the upper eyelash and lower eyelash regions are defined with reference to the center of the iris region. In this case, the eyelash regions are obtained by overlapping with the iris region; if there is no upper eyelash region, it is replaced with the lower eyelash region and vice versa (step 3 of Figure 2). After using the three images of the iris region, upper eyelash region, and lower eyelash region obtained as the input data of the first dense blocks of DenseNet-169, feature maps are extracted (steps 4–6 of Figure 2). Then, channel-wise concatenation is performed for the three extracted feature maps to produce one input datum (step 7 of Figure 2). Finally, using the feature map obtained through concatenation as input data to the shallow CNN, PAD is performed (step 8 of Figure 2). Then, the images of the iris, upper eyelash region, and lower eyelash region obtained through preprocessing blocks from the original live iris image and generated fake iris image by CycleGAN are used as training data for one dense block and shallow CNN in LRFID-Net to perform training (steps 9–11 of Figure 2).

3.2. Generating Training Dataset of Fake Images using CycleGAN

In this study, as shown in Figure 3, CycleGAN [8] was used to control and generate high-resolution fake images. CycleGAN is a model that performs training using unpaired datasets. Input data and target data are taken as input, and input data are converted to generate images similar to the target data. This model is suitable for generating counterfeit data of a specific person from unpaired iris datasets. Table 2 and Table 3 explain the generator and discriminator in CycleGAN, respectively.

Some scholars have explained that there exists a vanishing gradient problem when cross-entropy loss is used for the adversarial loss of the GAN [27]. Therefore, we adopted the least-squares GAN (LSGAN) loss in order to tackle the problem. In addition, for training with a sufficient number of samples, data augmentation with random left and right flipping of the training image or with random cropping was applied. Furthermore, when training CycleGAN, an iris image of the same class was used as the input image and target image to generate fake images at a level at which the fake images could not be distinguished from the real images by human visual observation. In this way, we defined all the iris images synthesized by the GAN as fake images.

\min_{D} V_{L S G A N} (D) = {[E}_{a ~ p_{d a t a} (a)} [{(D (a) - β)}^{2}] + {[E}_{b ~ p_{a} (b)} [{(D (G (b)) - α)}^{2}]

(1)

\min_{G} V_{L S G A N} (G) = {[E}_{b ~ p_{b} (b)} [{(D (G (b)) - γ)}^{2}]

(2)

Equation (1) represents the LSGAN loss of discriminator, where

a

,

b

, D, and G denote a fake image label, a real image label, a discriminator function, and a generator function, respectively. The term

{[E}_{a ~ p_{d a t a} (a)} [{(D (a) - β)}^{2}]

has the minimum value when

D (a) = β

. By the discriminator, real image x is determined as a real image label. The term

{[E}_{b ~ p_{a} (b)} [{(D (G (b)) - α)}^{2}]

has the minimum value in case that

D (G (b) = α

. By the discriminator, G(b) (the fake iris image by the generator) is determined as a fake image label. Equation (2) represents the LSGAN loss of the generator, where

γ

represents the value that G(b) is classified into a real image label by the discriminator. Therefore, data should be generated such that, in the

{[E}_{b ~ p_{b} (b)} [{(D (G (b)) - γ)}^{2}]

term,

D (G (b)) = γ

, so that the discriminator is deceived. In other words, the discriminator should be able to distinguish between real and fake images and the generator has the goal of generating images to fool the discriminator. Therefore, it is adopted for the training of GAN.

Figure 4 presents a flowchart for generating a fake iris image using CycleGAN. For the training of CycleGAN, an input image was chosen and an image in the same inner class as the input image was chosen as the target image. The target image was randomly chosen from the same inner class each time to prevent training overfitting. In general, GAN-generated images leave behind AI artifacts called GAN fingerprints and researchers have reported that GAN fingerprints make it easier to distinguish between GAN-generated images and original images [28]. Therefore, based on results reported elsewhere [28], this study adopted a Gaussian filter to remove the GAN fingerprint from the image generated by CycleGAN, thereby making the discrimination between the fake iris image generated by CycleGAN and the original image even more difficult.

3.3. Iris Detection and Definitions of Iris, Upper Eyelash, and Lower Eyelash Regions

In this subsection, we describe a method of extracting the iris, upper eyelash, and lower eyelash regions from ocular images. In the first step, the iris region of the image is extracted and the upper and lower eyelash regions are extracted with reference to the iris region. In general, the iris PA occurs mainly by replication of the iris region. For this reason, the difference between the fake image used in the attack and the original image is the largest in the iris region. In addition, the eyelashes tend to be dense and complex. Because of these characteristics, generating a fake image for attack through a GAN makes it difficult to replicate the eyelashes completely. Leveraging these points, performing PAD by dividing the image into the iris, upper eyelashes, and lower eyelashes has several advantages.

Some progress has been made in the detection of the iris region in previous studies. The circular edge detector proposed by Daugman et al. [29] is a widely used method. It detects the iris and pupil boundary on the basis of the maximum rate of change between two adjacent circles (the inner and outer circles), at which the variation in the contrast value is maximal. In addition, Camus et al. proposed a method of extracting the iris region using the standard Hough transform [30]. This method uses grayscale images as inputs and transformation to binary edge images is performed. Then, the edge information is extracted and the edge map is used to localize the iris edge.

In this study, based on previous results [31], subblock-based template matching and the two circular edge detection (the two CED) method [31] were adopted and used for iris detection. First, subblock-based template matching was performed to find the region where the eye is in a given image. In general, the pupil has very low pixel values, representing the darkness of the region, which indicates that there is a large difference in the pixel values of the surrounding regions, which are brighter than the pupil region. Based on this principle, subblock-based template matching was performed. In this method, the illumination level in the eye region and the surrounding region should be obtained. For the calculation of these pixel values, the integral image method was used [32]. Through the integral image, the sum of pixels can be obtained with high efficiency and the computation load can be greatly reduced. In the region of interest detected by subblock-based template matching, the two CED method is used and the iris and pupil circular boundaries are detected, as shown in Figure 5. The reason for detecting the iris and pupil circular boundaries at the same time through the two CED method rather than locating the iris circular boundary alone is because when only the detection of the iris boundary is performed, there may be cases where the iris boundary is falsely detected in the edge line of the eyelashes. Based on the detected iris region, the upper and lower eyelash regions, for which overlapping is allowed, are defined, as shown in Figure 6.

3.4. Proposed LRFID-Net Model

In this subsection, the LRFID-Net proposed in this study for detecting fake images is described, as shown in Figure 7. Using DenseNet-169 pretrained with ImageNet [33] as the backbone model, three feature maps are extracted from the images of the three regions inputted into the first dense block of DenseNet-169. DenseNet is trained while minimizing information loss for features in the previous layer through dense connectivity. This training method makes it possible to extract the features in the complex iris region more efficiently. The input image size of the model is 224 × 224, and this is resized for use by applying bilinear interpolation. The output of first dense block is extracted, and feature maps with dimensions of 56 × 56 × 128 are obtained from images of the abovementioned three regions. Table 4 explains the first dense blocks.

Then, the feature map of each region is concatenated channel-wise to generate 56 × 56 × 384 feature maps and, as shown in Figure 7, the maps are used as inputs to the proposed shallow CNN model. In the convolution layer in the shallow CNN in Figure 7, the kernel size is 3 × 3, the stride is 2, and the number of filters is 24. Then, a rectified linear unit (ReLU) and a max pooling layer (the pool size is 3 × 3 and the stride is 2) are applied to the feature maps obtained through the convolution layer. At this time, in the process of using all the feature maps of each region, the model is computationally burdensome. To address this problem, among various lightweight models, the shallow CNN model employs the ShuffleNet architecture [34], and the design consists of two shuffle stages. After processing through the first and second shuffle stages in Figure 7 in sequence, the global average pooling layer is applied. Then, a fully connected (FC) -layer is applied to the obtained feature maps to perform final classification.

Figure 8 shows in more detail the shuffle stage shown in Figure 7. In the shuffle stage, point-wise group convolution is used to reduce the computational load and minimize the training time. At this time, information exchange between channels is not possible, which limits the use of much of the available information; this problem was resolved through channel shuffle. After processing 1 × 1 convolution as a point-wise group convolution in each ShuffleNet unit, the channel shuffle operation is performed to ensure effective feature extraction. In the second 1 × 1 point-wise group convolution, even if the channel shuffle is removed, there is no performance loss, so it is not carried out separately at this stage. In addition, because channel shuffle makes training with more and various data possible, efficient feature extraction can be performed, even with a small number of channels. In the ShuffleNet unit, when the stride of the group convolution is 2, average pooling of the input feature map is performed to proceed with channel-wise concatenation with the final output feature. When the stride of the group convolution is not 2, direct channel-wise concatenation of the input feature map with the output feature map is performed. Table 5 provides details of the proposed shallow CNN model.

4. Experimental Results

4.1. Experimental Databases and Setups

To perform experiments using the proposed method, the Warsaw dataset of LiveDet-Iris-2017 (LiveDet-Iris-2017-Warsaw) [35] and the Notre Dame Contact Lens Detection dataset (LiveDet-Iris-2017-ND) [35] were used. The Warsaw dataset consists of 6845 fake images and 5170 live images, comprising 241 classes. LiveDet-Iris-2017-ND consists of cosmetic lens, soft lens, and live images. In this experiment, the cosmetic lens images were classified as fake images and the soft lens images were used as live images, together with the real live images. The dataset consists of 4786 live images and 2502 cosmetic lens images, comprising 322 classes.

For the training of LRFID-Net, each database was divided in half based on different classes to form A and B sub-datasets. Based on two-fold cross validation, training was performed using sub-dataset A, and testing was performed using sub-dataset B (1st fold cross validation). Then, training and testing were performed one more time with exchanging the two sub-datasets (second-fold cross validation), and the average accuracy was obtained through this process. Furthermore, to prevent overfitting of the model to the training data, approximately 10% of the training data were used as validation data. Table 2 presents the configuration of the data when the given databases were divided into sub-datasets A and B and the configuration when the dataset was constructed using the fake images generated by CycleGAN (third and fourth rows of Table 6).

Figure 9 shows original live images and images with the GAN fingerprint removed by applying Gaussian filtering to fake iris images generated using CycleGAN. In the figure, the following are shown: (a) original live images, (b) images without a Gaussian filter applied, and (c)–(e) images applied with Gaussian filters of sizes 3 × 3, 9 × 9, and 11 × 11, respectively. In this study, as shown in Figure 9b, PAD performance was evaluated after training the proposed model using the images generated by CycleGAN as fake images, and testing was performed one more time using the fake images in Figure 9c–e with Gaussian filters applied to remove the GAN fingerprint to perform a final evaluation of PAD performance.

The proposed method was implemented using TensorFlow 2.4.1 [36] and OpenCV version 4.5.3 [37] in the experimental settings with Window 10, NVIDIA Compute Unified Device Architecture (CUDA) version 11.1, and NVIDIA CUDA Deep Neural Network Library (CUDNN) version 8.1.0 [38]. The experiments were performed on a desktop computer equipped with an Intel Core i5-4690 CPU, 16-GB RAM, and an NVIDIA GeForce GTX 1070 graphics processing unit (GPU) [39]. The GPU includes 1920 CUDA cores and a graphics memory of 8 GB.

4.2. Training

4.2.1. Training of CycleGAN for Generating Fake Images

A detailed description of the training parameters used for the generation of fake images through CycleGAN is provided in Table 7. In addition, for the optimal settings for the model training, the epoch period of halving the learning rate (learning decay) was set to 100 epochs. In this study, the optimal hyperparameters were determined as outlined in Table 7 to perform experiments to derive the highest spoof detection accuracy for the training data.

Figure 10a shows the training and validation loss graphs of generator, and Figure 10b shows the training and validation loss graphs of discriminator. As reported in the paper on vanilla GAN [40], the loss of discriminator is usually smaller than that of generator. Therefore, as shown in Figure 10, the generator loss value has the range between 1 and 8 and the discriminator loss value has the range between 0.4 and 0.8. For this reason, using the same loss unit is difficult to present properly in the graphs. Therefore, in Figure 10a,b, we used different loss units. As shown in these subfigures, the training loss graphs were converged as the epoch increased, which indicates that CycleGAN was sufficiently trained using the training data. In addition, the subfigures show that the validation loss graphs converged with an increasing epoch, which indicates that CycleGAN was not overfitted to the training data.

4.2.2. Training of LRFID-Net for PAD

The training parameters of LRFID-Net used for PAD are presented in Table 8. In this study, the optimal hyperparameters in Table 8 were chosen to derive the highest PAD accuracy for the training data.

Figure 11 shows the accuracy and loss graphs for the training and validation of LRFID-Net. We can see that the training loss and accuracy graphs converged as the epoch increased, indicating that LRFID-Net was sufficiently trained using the training data. The validation loss and accuracy graphs of LRFID-Net also converged as the epoch increased, indicating that LRFID-Net was not overfitted to the training data.

4.3. Testing of Proposed Method

4.3.1. Evaluation Metric

The Frechet inception distance (FID) is proposed as a measure for capturing the similarity of generated images to real images in order to evaluate the performance of the GAN [41]. The Wasserstein distance (WD) with the original images is adopted in order to evaluate the illumination-normalized images synthesized by the GAN to improve the accuracy of finger-vein recognition [42]. Therefore, in this study, the metric shown in Equations (3) and (4) was used to measure the difference between the GAN-generated and real images. For the WD and FID, lower values indicate a small difference between the two images, whereas high values indicate a large difference between the two images.

W_{p} (μ, v) = {(\underset{π \in Π (μ, v)}{i n f} {\int_{R^{d} \times R^{d}} |A - B|}^{μ} d π)}^{1 / μ}

(3)

F I D (x, g) = {‖μ_{x} - μ_{g}‖}_{2}^{2} + T r (\sum_{x} + \sum_{g} - 2 {(\sum}_{x} \sum_{g})^{1 / 2})

(4)

In the WD represented in Equation (3),

μ a n d v

are the probability distributions for the two images,

Π (μ, v)

represents the joint probability distribution,

R^{d} \times R^{d}

is the marginal probability distribution, and

d π

represents the joint probability. In the FID represented in Equation (4), in each image,

μ_{x}

and

μ_{g}

indicate the mean,

\sum_{x}

and

\sum_{g}

indicate covariance matrices, and

T r

is the diagonal sum function by the pretrained Inception-v3.

In addition, for the evaluation of PAD accuracy, according to the LivDet-Iris-2017 competition [35] and ISO/IEC-30107 standard [43], the attack presentation classification error rate (APCER), bona fide presentation classification error rate (BPCER), and average classification error rate (ACER) were adopted. The APCER refers to an error rate that falsely classifies a fake image (spoof attack image) as a real image (bona fide one), as shown in Equation (5). The BPCER refers to the error rate of real images being falsely determined as fake images, as shown in Equation (6); this error rate is also called the NPCER, which stands for normal presentation classification error rate. The ACER refers to the average error rate of the APCER and BPCER, as shown in Equation (7). Here, 1-APCER denotes a true detection rate (TDR) and the BPCER denotes the false detection rate (FDR).

A P C E R = 1 - (\frac{1}{X_{f a k e}}) \sum_{n = 1}^{X_{f a k e}} {R e s}_{n}

(5)

B P C E R = \frac{1}{X_{r e a l}} \sum_{n = 1}^{X_{r e a l}} {R e s}_{n}

(6)

A C E R = \frac{1}{2} (A P C E R + B P C E R)

(7)

In the above equations,

X_{f a k e}

represents the number of fake images and

X_{r e a l}

represents the number of real images. In Equation (5), with

n

as the input fake image, if a fake image is falsely classified as a real image, the value of

{R e s}_{n}

is 0, and if a fake image is correctly classified as a fake image, the value of

{R e s}_{n}

is 1. In Equation (6), with

n

as the input real image, if a real image is falsely classified as a fake image, the value of

{R e s}_{n}

is 1, and if a real image is correctly classified as a real image, the value of

{R e s}_{n}

is 0.

4.3.2. Performance Evaluation of Image Quality

In the first performance evaluation experiment, as presented in Table 9, the similarity level between fake iris images by CycleGAN in this study and the original real images was evaluated in comparison with that of state-of-the-art methods [8,20,22,44,45,46,47]. As shown in Table 9, measurement of the WD and FID confirmed that the WD and FID values obtained from the images generated by the state-of-the-art methods were higher than those obtained from the images synthesized by CycleGAN. The WD is calculated by the difference between images based on the difference in pixel distribution. The FID is a method for comparative evaluation of image statistics on the basis of the features acquired via the Inception-v3 model using ImageNet pre-trained weights. With higher values for the WD and FID for the state-of-the-art methods, CycleGAN generates fake images closer to the real images in terms of PAD.

4.3.3. Performance Evaluation of PAD with LiveDet-Iris-2017-Warsaw

Ablation Study

In the first ablation study, the ACERs of PAD were compared according to various backbone models, with and without Gaussian filtering, as shown in Table 10. For backbone models, DenseNet-169 [33], ResNet-152 [48], VGG-19 [49], and XceptionNet [50] were used for comparison. In addition, for the comparison of results with and without Gaussian filtering, the PAD performance for cases where fake images were generated using CycleGAN only and that for the cases in which additional Gaussian filtering was performed to remove the GAN fingerprint after fake images were generated with CycleGAN were examined, respectively, as shown in Figure 9. To perform the ablation study, the images on the left in Figure 6a,b—the ocular images—were used as the input image.

As a result of the experiments, as shown in Table 10, the ACER of PAD showed the lowest values when DenseNet-169 was used as the backbone model; therefore, DenseNet-169 was used as the backbone model in this study. It is believed that DenseNet-169 showed the best PAD accuracy because, as discussed in Section 3.4, all the feature maps obtained from the previous layers through densely connected operation were used in the next layers, which made training with more details possible. In addition, for comparison between the results without Gaussian filtering and those with Gaussian filtering, the highest ACER was obtained when the 11 × 11 filter was used. Therefore, this case was judged as the fake iris image that is the most difficult to discriminate. The subsequent experiments evaluated PAD performance for the case without Gaussian filtering and the case with the application of additional Gaussian filtering based on an 11 × 11 filter.

In the next ablation study, the ACERs of PAD for the cases using each local region, feature-level fusion, score-level fusions, and the proposed LRFID-Net were measured, as listed in Table 11. For the feature-level fusion method, the three local regions (upper eyelashes, lower eyelashes, and iris region) in Figure 7 were the input and the three 7 × 7 × 128 feature maps obtained from DenseNet-169 were concatenated channel-wise. The 7 × 7 × 384 feature map obtained from the concatenation was used as an input into an FC layer, and the final classification into live and fake images was performed.

In addition, with the three local regions as the respective inputs, three scores (probability) were acquired from the last layer (softmax layer) of DenseNet-169; for these three scores, score-level fusion was performed using the weighted sum and weighted product methods [51], as in Equations (8) and (9), and SVM, as shown in Equation (10) [52]. Specifically, three softmax probability values (

x_{i}, x_{j}, x_{k}

) were calculated for the real image class of DenseNet-169 for three local regions. Thereafter, using these three probability values as inputs, a final probability (P1) for the real image class was obtained using a variety of score-level fusion methods represented in Equations (8)–(10). In the same way, three softmax probability values (

x_{i}, x_{j}, x_{k}

) were calculated for the fake iris image class of DenseNet-169 for the three local regions; then, using these three probability values as inputs, a final probability (P2) for the fake iris image class was obtained using the score-level fusion methods of Equations (8)–(10). By comparing the two probabilities (P1 and P2) thus obtained, the final decision was made about the class with the larger probability value. Furthermore, in this experiment, using training data, PAD accuracy was evaluated for various weights, SVM parameters, and SVM kernels, e.g., linear, radial basis, polynomial, and sigmoid kernels, as shown in Equations (8)–(10). In this way, the optimal weights and parameters showing the lowest value for ACER and the radial basis kernel, the optimal kernel, were determined.

{W e i g h t e d s u m = x}_{i} W_{1} + x_{j} W_{2} + x_{k} W_{3}

(8)

W e i g h t e d p r o d u c t = {x_{i}}^{W_{1}} \cdot {x_{j}}^{W_{2}} \cdot {x_{k}}^{W_{3}}

(9)

f (x) = s i g n (\sum_{i = 1}^{k} a_{i} y_{i} K (x_{i}, x_{j}, x_{k}) + b)

(10)

As shown in Table 11, when only the iris region was extracted and used, the ACERs were 1.84% and 0.84% with and without the Gaussian filter, respectively, which is better than those values obtained when other local regions are used. It was concluded that this is because there were more features for distinguishing between live and fake images in the detailed iris texture. In addition, the PAD accuracy was higher when the three local regions were used in combination by applying feature- or score-level fusion than when each of the three local regions was used and the method of feature-level fusion showed higher accuracy than score-level fusion. However, the results confirmed that the highest PAD accuracy was achieved with 0.03% and 0.03%, respectively, when using our proposed LRFID-Net. It is believed that this is because, as shown in Figure 7, for feature maps obtained by the fusion of the feature maps obtained in each region, the final feature maps are more suitable.

In the next ablation study, we compared the ACERs of PAD by the number of dense blocks, as shown in Table 12. In Table 12, the shallow CNN was fixed to the same shape as in Figure 7. When using only one dense block and two shuffle stages, we can see that the best performance was achieved with ACERs of 0.03% and 0.03% without and with Gaussian filtering, respectively.

In the next ablation study, the ACERs of PAD by LRFID-Net were compared according to the cases of using various combinations of each local region, as shown in Table 13. In Table 13, I, U, and L represent the iris, upper eyelash, and lower eyelash regions, respectively. Table 13 reveals that, among combinations of two regions, U + L had the highest error rates, with ACERs of 18.16% and 22.93%, which means that the feature information of the iris region is the most important. In addition, when the iris, upper eyelash, and lower eyelash regions were all used, the highest PAD accuracy was achieved.

In the next ablation study, the ACERs of PAD by LRFID-Net with I + U + L from Table 13 were compared when using Gaussian filtering with various filter sizes, as shown in Table 14. The experimental results reveal that, as in Table 13 and Table 14, the proposed LRFID-Net with I + U + L had the same PAD accuracy regardless of the GAN fingerprint removal by Gaussian filtering with various sizes. This indicates that the proposed method shows generality of performance regardless of the size of the Gaussian filter from Figure 9.

In the final ablation study, as shown in Table 15, the ACERs of PAD by LRFID-Net with I + U + L from Table 13 were compared when using various fake image generation methods. The fake image generation methods compared were CycleGAN [8], used in this study, iDCGAN [20], RaSGAN [22], ACL-GAN [46], and FastGAN [47]. As shown in Table 15, the use of LRFID-Net with the I + U + L developed in this study shows that the ACERs were 0.02%, 0%, 0.02%, 0.02%, and 0.03% without Gaussian filtering and 0.07%, 0.07%, 0.02%, 0.02%, and 0.03% with Gaussian filtering, indicating that the proposed method has generality of performance for various fake image generation methods.

Comparisons with the State-of-the-Art Methods

In this subsection, the ACERs of PAD for the proposed LRFID-Net were compared with those of state-of-the-art methods. As state-of-the-art methods, D-NetPAD [12], DCLNet [15], AG-PAD [25], vision transformer (ViT) [53], and MaxViT [54] were selected, and comparative experiments were conducted. The results are shown in Table 16. The ACERs of the proposed LRFID-Net are 0.03% and 0.03% without and with Gaussian filtering, respectively, which shows superior PAD performance compared with the performance of the state-of-the-art PAD methods. The superior performance was achieved because LRFID-Net is more effective in extracting the PAD features for each local region than the state-of-the-art methods.

In Figure 12, the ROC curves of LRFID-Net and the state-of-the-art methods are compared. Figure 12 confirmed that LRFID-Net had higher PAD accuracy than the state-of-the-art PAD methods in the results without Gaussian filtering and with 11 × 11 Gaussian filtering.

4.3.4. Performance Evaluation of PAD with LiveDet-Iris-2017-ND

Ablation Study

As shown in Table 17, the ACERs of PAD were compared according to various backbone models with and without Gaussian filtering. For the backbone models, DenseNet-169, ResNet-152, VGG-19, and XceptionNet were used for comparison, as shown in Table 10. Moreover, for comparison of results with and without Gaussian filtering, the PAD performance when fake images were generated using CycleGAN only and when additional Gaussian filtering was performed to remove the GAN fingerprint after fake images were generated with CycleGAN are shown in Figure 9. To perform the ablation study, the images on the left in Figure 6a,b—the ocular images—were used as the input images. As a result of the experiments, as shown in Table 17, the ACER of PAD had the lowest values when DenseNet-169 was used as the backbone model; therefore, DenseNet-169 was used as the backbone model in this study. In addition, when the results without and with Gaussian filtering were compared, the highest ACER was obtained when a 3 × 3 filter was used. Therefore, this case was judged to be the fake iris image that was the most difficult to discriminate. From the subsequent experiments, the PAD performance was evaluated for the case without Gaussian filtering and the case in which additional Gaussian filtering was applied with a 3 × 3 filter.

In the next ablation study, the ACERs of PAD according to the cases of using each local region, feature-level fusion, score-level fusions, and the proposed LRFID-Net were measured, as shown in Table 18. For the feature-level and score-level fusion methods, the same methods as described in Table 11 were used. As shown in Table 18, in general, the PAD accuracy was higher when the three local regions were used in combination by applying feature- or score-level fusion than when each of the three local regions was used alone, and the ACERs of the feature-level fusion were 3.22% and 2.48%, showing that it had a higher PAD accuracy than the score-level fusion. However, the results confirmed that the highest PAD accuracies were achieved, 0.11% and 0.11%, without and with Gaussian filtering, respectively, when using our proposed LRFID-Net. It is believed that this is because, as shown in Figure 7, for feature maps obtained by the fusion of feature maps obtained in each region through the introduction of an additional convolution layer and shuffle stages, the final feature maps more suitable for PAD were extracted.

In the next ablation study, we compared the ACERs of PAD using the number of dense blocks and shuffle stages, as shown in Table 19. In Table 19, the shallow CNN was fixed to the same shape as in Figure 7. When using only one dense block and two shuffle stages, we can see that the best performance is achieved, with ACERs of 0.11% and 0.11% without and with Gaussian filtering, respectively.

In this ablation study, the ACERs of PAD by LRFID-Net were compared for cases where various combinations of the local regions were used, as shown in Table 20. In the table, I, U, and L represent iris, upper eyelash, and lower eyelash regions, respectively. In the case of Gaussian filtering, it can be seen that the ACER for U + L is the lowest (14.65%), indicating that the feature information of the iris region is the most important. In addition, when the iris, upper eyelash, and lower eyelash regions were all used, the highest PAD accuracies were achieved.

In the next ablation study, the ACERs of PAD by LRFID-Net with I + U + L of Table 20 were compared for Gaussian filtering with various filter sizes, as shown in Table 21. The experimental results in Table 20 and Table 21 show that the proposed LRFID-Net with I + U + L had a high PAD accuracy regardless of the status of GAN fingerprint removal through Gaussian filtering. This indicates that the proposed method has generality of performance regardless of the status of Gaussian filter use and the size of the filter, as shown in Figure 9.

In the final ablation study, as shown in Table 22, the ACERs of PAD by LRFID-Net with the I + U + L from Table 20 were compared when using various fake image generation methods. The fake image generation methods, CycleGAN (used in this study), iDCGAN, RaSGAN, ACL-GAN, and FastGAN were compared. As shown in Table 22, the use of LRFID-Net with the I + U + L developed in this study, for each generation model, has ACERs of 0.03%, 0.04%, 0%, 0.02%, and 0.11% without Gaussian filtering and 0.02%, 0.02%, 0.04%, 0.02%, and 0.11% with Gaussian filtering, indicating that the proposed method has generality of performance for various fake image generation methods.

Comparisons with the State-of-the-Art Methods

In this subsection, the ACERs of PAD from the proposed LRFID-Net were compared with those from the state-of-the-art methods. As shown in Table 23, in case of presentation attack with 3 × 3 Gaussian filtering on the generated images, LRFID-Net was 0.81% higher in PAD accuracy than the second best method (D-NetPAD). It is believed that the superior performance was achieved by the proposed method because LRFID-Net is more effective in extracting the PAD features for each local region than the state-of-the-art methods.

In Figure 13, the ROC curves of LRFID-Net and the state-of-the-art methods are compared. As shown in Figure 13, LRFID-Net had higher PAD accuracy than the state-of-the-art PAD methods in the results without Gaussian filtering and with 3 × 3 Gaussian filtering.

4.4. Statistical Analysis

A t-test [55] was performed and Cohen’s d-value [56] was measured for the ACERs of the proposed method and those of the second-best methods in Table 16 and Table 23 for the statistical test. As shown in Figure 14a, the p-value of the second-best method and the proposed method in Table 16 were measured. The p-value was 0.036, representing a 95% confidence level, and Cohen’s d-value was 11.40 (large effective size). As shown in Figure 14b, the p-value of the second-best method and the proposed method in Table 23 was measured. The p-value was 0.026, representing a confidence level of 95%, and Cohen’s d-value was 4.97 (large effective size). These results indicate a significant difference between the ACERs for the proposed method and the second-best methods in Table 16 and Table 23.

4.5. Processing Time

The average processing time for one image with the proposed PAD method was measured. The measurements were carried out in two environments: the desktop computer environment described in Section 4.1 and the Jetson TX2 embedded board, as shown in Figure 15. The Jetson TX2 system consists of an NVIDIA Pascal family GPU with 256 CUDA cores. The reason for measuring the processing time in the embedded system is that most of the iris recognition systems in the literature were installed in the embedded-system-based door access control environment. Therefore, the aim is to verify the applicability of the proposed method in such an embedded system environment. The measured processing times of the proposed PAD method were 25.6 ms in the desktop environment and 70.01 ms in the Jetson TX2 system environment, indicating 39.06 (1000/25.6) frames per second (fps) and 14.28 (1000/70.01) fps, respectively. Although the processing time was longer in an embedded system environment with limited resources for computation compared with a desktop environment, the results confirm that the proposed method is feasible on embedded boards with low computational power.

4.6. Comparisons of Processing Complexity Using Our Method and the State-of-the-Art Methods

Table 24 shows the comparisons between the GPU floating point operations per second (GFLOPs), number of parameters, memory usage, and inference time of the proposed method and those of the state-of-the-art methods in the computing environment described in Section 4.1. The GFLOPs of our proposed LRFID-Net is 12.6 Giga, which is the fourth lowest among all the methods in Table 24. The number of parameters for our model is 4.757 Mega, which is lower than for all the other methods. The memory usage of our model is 2.96 Giga bytes, which is the second lowest among all the methods. In addition, the average inference time per an image for our model is 25.6 ms, which is the third lowest among all the methods. However, our LRFID-Net shows a higher accuracy than the state-of-the-art methods, as shown in Table 16 and Table 23 and Figure 12 and Figure 13.

4.7. Discussion

Next, Figure 16 shows cases of correct PAD by the proposed method. As shown in the figure, PAD is properly performed by the proposed method, even when the live and fake images are highly similar.

Figure 17 and Figure 18 show cases of incorrect PAD by the proposed method. In Figure 17, an error occurred that resulted in false classification of the live image as a fake image (bona fide presentation classification error). Figure 18 shows an error in which the fake image was falsely classified as a live image (attack presentation classification error). This is because the fake images generated by CycleGAN are more similar to the live images and have less noise. Table 25 outlines the peak signal-to-noise ratio (PSNR) values [57] of the image in Figure 16, Figure 17 and Figure 18. In this case, the PSNR was measured between the synthesized fake image and original real image. The value is higher the more similar the two images are to each other. As shown in Table 25, the PSNR of the fake image is higher in the case of incorrect classification than that in the case of correct classification. This result shows that the PAD accuracy is reduced when the fake image is generated closer to the original real image. As shown in the zoomed image in Figure 17 and Figure 18, we observe that the misclassification error occurs even though the generated fake iris image is blurred compared with the original one. Thus, we can see that although the proposed method performs well for the PAD, it has limitations in capturing the fine blurred features that occur locally in the iris region of the fake iris images. To improve this, more fine-grained local information in the iris region should be used to ensure that the generated fake iris images can be well classified from the real iris images in future work.

Figure 19 shows the images generated by Grad-CAM [58] using the output feature maps in the fusion layer (channel-wise concatenation layer), the first shuffle stage, and the last shuffle stage shown in Figure 7.

In the Grad-CAM images, regions in colors close to red indicate important feature regions for PAD and regions in colors close to blue indicate insignificant feature regions for PAD. From the output of the last Grad-CAM image in Figure 19a, in the case of the original real image, it can be seen that with the iris region as the main region, important features for PAD were extracted in the upper and lower eyelash regions. However, in the case of the fake images shown in Figure 19b,c, more important features were extracted from the iris region. Furthermore, values in the upper eyelash region have larger feature values than those in the lower eyelash region. This is because the upper eyelashes are dense and difficult to generate with a GAN, indicating a high possibility of a large difference. However, the lower eyelash region tends to be less complex than the upper one and, although this region is considered in the process, it does not have large feature values that make it an important region for PAD. In addition, the regions in Figure 19b,c are not significantly different. This result shows why good PAD performance was achieved for both images with and without Gaussian filtering. Comparing Figure 19a–c reveals that important features for PAD are extracted from different regions in the original live image and fake image. This indicates that the proposed LRFID-Net effectively extracts important features for PAD.

Since AI artifacts and a lack of diversity can occur when using GANs to generate fake iris images, we used the following methods to solve the problems of AI artifacts and lack of diversity. The images produced by GAN usually leave behind AI artifacts called GAN fingerprints. Therefore, in order to make classification more difficult, we removed the GAN fingerprints by applying a Gaussian filter based on previous research [28], as described in Section 3.2. Figure 9 shows the fake iris images generated by CycleGAN and processed by applying Gaussian filters with kernel sizes 3 × 3, 9 × 9, and 11 × 11. We also evaluated the classification performance of the existing classification model and the proposed LRFID-Net on the fake images processed by Gaussian filters with different kernel sizes, as shown in Table 10 and Table 14 in Section 4.3.3 and Table 17 and Table 21 in Section 4.3.4. We chose the kernel size with which the largest classification error occurred among 3 × 3, 9 × 9, and 11 × 11 filters and also conducted comparative experiments using the state-of-the-art methods and the proposed method with the processed images and the selected Gaussian filter. These experiments can be found in Table 11, Table 12 and Table 13, Table 15 and Table 16, Table 18, Table 19 and Table 20, Table 22 and Table 23 and Figure 12 and Figure 13. In order to enhance diversity of the generated images, we randomly selected the target image from same class for the image generation by CycleGAN, as described in Section 3.2 and Figure 4.

Our proposed LRFID-Net can usually be applied for user authentication in computers, financial transaction systems on mobile phones, and access control systems, etc. The data generation by CycleGAN can usually be performed on a server computer, and the training of LRFID-Net can be also performed on a server computer. Then, only the training model parameters of LRFID-Net can be stored in the above systems without saving any biometric data. In addition, only the algorithm code (for the preprocessing of including the detections of upper and lower eyelashes regions with iris area) can be stored in the above systems without storing any biometric data. Therefore, even if the model parameters of the LRFID-Net and algorithm code are stolen, it is impossible to recover any biometric data from them. Therefore, our proposed method can ensure the security and privacy of biometric data when it is actually applied.

5. Conclusions

A novel method, LRFID-Net, was proposed to detect fake iris images generated using CycleGAN and fake iris images with GAN fingerprints removed using Gaussian filters. In the proposed method, fake-iris detection was performed according to the following process. From the input image, images of the iris, upper eyelash, and lower eyelash local regions were extracted. Through a deep CNN, the features of each local region image were extracted. The extracted features were concatenated channel-wise and used as the input to the new shallow CNN to perform fake-iris detection. In this case, as for the shallow CNN, a lightweight model was used to diminish the processing time for the purpose of addressing the drawback of the computational burden resulting from the use of three input images.

Evaluating PAD performance by two-fold cross-validation with the LiveDet-Iris-2017-Warsaw and LiveDet-Iris-2017-ND datasets, which are publicly available iris presentation attack databases, confirmed that the average classification error rates of the proposed LRFID-Net were 0.03% and 0.11% in the respective datasets, which were 1.17% and 0.81% lower than the second-best state-of-the-art method (D-NetPAD), respectively; the superiority of this performance was shown to be statistically significant. We also measured the evaluation parameters and found that the number of parameters for LRFID-Net was 4.757 Mega, which was 31.61% less than that of the second-best state-of-the-art method (D-NetPAD), which was 6.956 Mega. Furthermore, Grad-CAM analysis confirmed that important features for PAD were extracted with LRFID-Net in different regions of the original live image and fake image. However, when the generated fake iris image was highly similar to the original image generated by CycleGAN, PAD errors occurred.

In future work, to address this problem, further studies are planned in which the PAD performance is improved through training with more details of iris texture. In addition, the development of a method for generating more high-quality fake iris images is planned, as is an investigation into the applicability of the proposed method to other fake images of various biometric modalities, such as fingerprints and facial images.

Author Contributions

Methodology, J.S.K.; conceptualization, Y.W.L., J.S.H., S.G.K. and G.B.; supervision, K.R.P.; writing—original draft, J.S.K.; writing—editing and review, K.R.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by the National Research Foundation of Korea (NRF), funded by the Ministry of Science and ICT (MSIT) through the Basic Science Research Program (NRF-2021R1F1A1045587), in part by the NRF funded by the MSIT through the Basic Science Research Program (NRF-2022R1F1A1064291), and in part by the MSIT, Korea, under the ITRC (Information Technology Research Center) support program (IITP-2023-2020-0-01789) supervised by the IITP (Institute for Information & Communications Technology Planning & Evaluation).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Yang, K.; Xu, Z.; Fei, J. DualSANet: Dual Spatial Attention Network for Iris Recognition. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2021; pp. 888–896. [Google Scholar]
Luo, Z.; Wang, Y.; Wang, Z.; Sun, Z.; Tan, T. FedIris: Towards More Accurate and Privacy-Preserving Iris Recognition via Federated Template Communication. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, New Orleans, LA, USA, 19–20 June 2022; pp. 3357–3366. [Google Scholar]
Agarwal, A.; Noore, A.; Vatsa, M.; Singh, R. Generalized Contact Lens Iris Presentation Attack Detection. IEEE Trans. Biom. Behav. Identity Sci. 2022, 4, 373–385. [Google Scholar] [CrossRef]
Fang, M.; Damer, N.; Boutros, F.; Kirchbuchner, F.; Kuijper, A. Iris Presentation Attack Detection by Attention-Based and Deep Pixel-Wise Binary Supervision Network. In Proceedings of the IEEE International Joint Conference on Biometrics, Shenzhen, China, 4–7 August 2021; pp. 1–8. [Google Scholar]
Fang, Z.; Czajka, A.; Bowyer, K.W. Robust Iris Presentation Attack Detection Fusing 2D and 3D Information. IEEE Trans. Inform. Forensic Secur. 2021, 16, 510–520. [Google Scholar] [CrossRef]
Jolicoeur-Martineau, A. The Relativistic Discriminator: A Key Element Missing from Standard GAN. arXiv 2018, arXiv:1807.00734. [Google Scholar]
Mirza, M.; Osindero, S. Conditional Generative Adversarial Nets. arXiv 2014, arXiv:1411.1784. [Google Scholar]
Zhu, J.-Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In Proceedings of IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2242–2251. [Google Scholar]
Iris Spoof Detection Model with Synthetic Iris Images. Available online: https://github.com/dmdm2002/Iris-Spoof-Detection (accessed on 27 June 2022).
Raghavendra, R.; Raja, K.B.; Busch, C. ContlensNet: Robust Iris Contact Lens Detection Using Deep Convolutional Neural Networks. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, Santa Rosa, CA, USA, 24–31 March 2017; pp. 1160–1167. [Google Scholar]
He, L.; Li, H.; Liu, F.; Liu, N.; Sun, Z.; He, Z. Multi-Patch Convolution Neural Network for Iris Liveness Detection. In Proceedings of the IEEE 8th International Conference on Biometrics Theory, Applications and Systems, Niagara Falls, NY, USA, 2–9 September 2016; pp. 1–7. [Google Scholar]
Sharma, R.; Ross, A. D-NetPAD: An Explainable and Interpretable Iris Presentation Attack Detector. In Proceedings of the IEEE International Joint Conference on Biometrics, Houston, TX, USA, 28 September–1 October 2020; pp. 1–10. [Google Scholar]
Hoffman, S.; Sharma, R.; Ross, A. Convolutional Neural Networks for Iris Presentation Attack Detection: Toward Cross-Dataset and Cross-Sensor Generalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1701–17018. [Google Scholar]
Pala, F.; Bhanu, B. Iris Liveness Detection by Relative Distance Comparisons. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 664–671. [Google Scholar]
Choudhary, M.; Tiwari, V.; U., V. An Approach for Iris Contact Lens Detection and Classification Using Ensemble of Customized DenseNet and SVM. Futur. Gener. Comp. Syst. 2019, 101, 1259–1270. [Google Scholar] [CrossRef]
Choudhary, M.; Tiwari, V.; U., V. Iris Anti-Spoofing through Score-Level Fusion of Handcrafted and Data-Driven Features. Appl. Soft. Comput. 2020, 91, 106206. [Google Scholar] [CrossRef]
Jaswal, G.; Verma, A.; Roy, S.D.; Ramachandra, R. DFCANet: Dense Feature Calibration-Attention Guided Network for Cross Domain Iris Presentation Attack Detection. arXiv 2021, arXiv:2111.00919. [Google Scholar]
Nguyen, D.; Pham, T.; Lee, Y.; Park, K. Deep Learning-Based Enhanced Presentation Attack Detection for Iris Recognition by Combining Features from Local and Global Regions Based on NIR Camera Sensor. Sensors 2018, 18, 2601. [Google Scholar] [CrossRef]
Chen, C.; Ross, A. Exploring the Use of IrisCodes for Presentation Attack Detection. In Proceedings of the IEEE 9th International Conference on Biometrics Theory, Applications and Systems, Redondo Beach, CA, USA, 22–25 October 2018; pp. 1–9. [Google Scholar]
Kohli, N.; Yadav, D.; Vatsa, M.; Singh, R.; Noore, A. Synthetic Iris Presentation Attack Using iDCGAN. arXiv 2017, arXiv:1710.10565. [Google Scholar]
Yadav, D.; Kohli, N.; Agarwal, A.; Vatsa, M.; Singh, R.; Noore, A. Fusion of Handcrafted and Deep Learning Features for Large-Scale Multiple Iris Presentation Attack Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 685–6857. [Google Scholar]
Yadav, S.; Chen, C.; Ross, A. Synthesizing Iris Images Using RaSGAN with Application in Presentation Attack Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–17 June 2019; pp. 2422–2430. [Google Scholar]
Yadav, S.; Chen, C.; Ross, A. Relativistic Discriminator: A One-Class Classifier for Generalized Iris Presentation Attack Detection. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, Snowmass Village, CO, USA, 1–5 March 2020; pp. 2624–2633. [Google Scholar]
Yadav, S.; Ross, A. CIT-GAN: Cyclic Image Translation Generative Adversarial Network with Application in Iris Presentation Attack Detection. arXiv 2020, arXiv:2012.02374. [Google Scholar]
Chen, C.; Ross, A. Attention-Guided Network for Iris Presentation Attack Detection. arXiv 2020, arXiv:2010.12631. [Google Scholar]
Zou, H.; Zhang, H.; Li, X.; Liu, J.; He, Z. Generation Textured Contact Lenses Iris Images Based on 4DCycle-GAN. In Proceedings of the 24th International Conference on Pattern Recognition, Beijing, China, 20–24 August 2018; pp. 3561–3566. [Google Scholar]
Mao, X.; Li, Q.; Xie, H.; Lau, R.Y.; Wang, Z.; Paul Smolley, S. Least Squares Generative Adversarial Networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2794–2802. [Google Scholar]
Neves, J.C.; Tolosana, R.; Vera-Rodriguez, R.; Lopes, V.; Proença, H.; Fierrez, J. GANprintR: Improved Fakes and Evaluation of the State of the Art in Face Manipulation Detection. arXiv 2020, arXiv:1911.05351. [Google Scholar] [CrossRef]
Daugman, J. How Iris Recognition Works. IEEE Trans. Circuits Syst. Video Technol. 2004, 14, 21–30. [Google Scholar] [CrossRef]
Camus, T.A.; Wildes, R. Reliable and Fast Eye Finding in Close-up Images. In Proceedings of International Conference on Pattern Recognition, Quebec City, QC, Canada, 11–15 August 2002; pp. 389–394. [Google Scholar]
Lee, Y.W.; Kim, K.W.; Hoang, T.M.; Arsalan, M.; Park, K.R. Deep Residual CNN-Based Ocular Recognition Based on Rough Pupil Detection in the Images by NIR Camera Sensor. Sensors 2019, 19, 842. [Google Scholar] [CrossRef] [PubMed]
Viola, P.; Jones, M.J. Robust Real-time Face Detection. Int. J. Comput. Vis. 2004, 57, 137–154. [Google Scholar] [CrossRef]
Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. arXiv 2018, arXiv:1608.06993. [Google Scholar]
Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. arXiv 2017, arXiv:1707.01083. [Google Scholar]
Yambay, D.; Becker, B.; Kohli, N.; Yadav, D.; Czajka, A.; Bowyer, K.W.; Schuckers, S.; Singh, R.; Vatsa, M.; Noore, A.; et al. LivDet iris 2017—Iris liveness detection competition 2017. In Proceedings of the International Conference on Biometrics, Denver, CO, USA, 1–4 October 2017; pp. 733–741. [Google Scholar]
Tensorflow. Available online: https://www.tensorflow.org/ (accessed on 12 September 2023).
OpenCV. Available online: https://docs.opencv.org/4.5.3/index.html (accessed on 12 September 2023).
NVIDIA CUDA Deep Neural Network Library. Available online: https://developer.nvidia.com/cudnn (accessed on 12 September 2023).
NVIDIA GeForce GTX 1070. Available online: https://www.geforce.com/hardware/desktop-gpus/geforce-gtx-1070/specifications (accessed on 12 September 2023).
Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. arXiv 2014, arXiv:1406.2661. [Google Scholar] [CrossRef]
Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. GANs Trained by a Two Time-scale Update Rule Converge to a Local Nash Equilibrium. In Proceedings of the Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 1–12. [Google Scholar]
Hong, J.S.; Choi, J.; Kim, S.G.; Owais, M.; Park, K.R. INF-GAN: Generative Adversarial Network for Illumination Normalization of Finger-Vein Images. Mathematics 2021, 9, 2613. [Google Scholar] [CrossRef]
ISO/IEC JTC1 SC37; Biometrics-ISO/IEC WD 30107–3: 2014 Information Technology—Presentation Attack Detection-Part 3: Testing and Reporting and Classification of Attacks. International Organization for Standardization: Geneva, Switzerland, 2014.
Karras, T.; Aila, T.; Laine, S.; Lehtinen, J. Progressive growing of GANs for improved quality, stability, and variation. In Proceeding of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018; pp. 1–26. [Google Scholar]
Isola, P.; Zhu, J.-Y.; Zhou, T.; Efros, A.A. Image-to-Image Translation with Conditional Adversarial Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5967–5976. [Google Scholar]
Zhao, Y.; Wu, R.; Dong, H. Unpaired Image-to-Image Translation Using Adversarial Consistency Loss. In Proceedings of European Conference on Computer Vision, Online, 23–28 August 2020; pp. 800–815. [Google Scholar]
Liu, B.; Zhu, Y.; Song, K.; Elgammal, A. Towards Faster and Stabilized GAN Training for High-Fidelity Few-Shot Image Synthesis. In Proceeding of the International Conference on Learning Representations, Vienna, Austria, 3–7 May 2021; pp. 1–22. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2015, arXiv:1409.1556. [Google Scholar]
Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1800–1807. [Google Scholar]
Mateo, J.R.S.C. Weighted Sum Method and Weighted Product Method. In Multi Criteria Analysis in the Renewable Energy Industry; Springer Science & Business Media: London, UK, 2012; pp. 19–22. [Google Scholar]
Vapnik, V. Statistical Learning Theory; Wiley: Hoboken City, NJ, USA, 1998. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. In Proceedings of the International Conference on Learning Representations, Vienna, Austria, 3–7 May 2020; pp. 1–21. [Google Scholar]
Tu, Z.; Talebi, H.; Zhang, H.; Yang, F.; Milanfar, P.; Bovik, A.; Li, Y. MaxViT: Multi-Axis Vision Transformer. arXiv 2022, arXiv:2204.01697. [Google Scholar]
Mishra, P.; Singh, U.; Pandey, C.M.; Mishra, P.; Pandey, G. Application of student’s t-test, analysis of variance, and covariance. Ann. Card. Anaesth. 2019, 22, 407–411. [Google Scholar] [CrossRef]
Cohen, J. A power primer. Psychol. Bull. 1992, 112, 155. [Google Scholar] [CrossRef]
Hore, A.; Ziou, D. Image Quality Metrics: PSNR vs. In SSIM. In Proceedings of the 20th International Conference on Pattern Recognition, Istanbul, Turkey, 23–26 August 2010; pp. 2366–2369. [Google Scholar]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]

Figure 1. Samples of live iris images and fake iris images generated using CycleGAN: (a) real iris images and (b) generated fake iris images using the live images of (a) based on CycleGAN.

Figure 2. Overview of our iris PAD scheme.

Figure 3. Architecture of CycleGAN: (a) generator and (b) discriminator.

Figure 4. Fake iris image generation process using CycleGAN.

Figure 5. Iris and pupil circular boundaries detected by subblock-based template matching and the two CED method. The iris region can be properly found, even in images with a part of the iris covered by the eyelid or with a part of the iris cut off.

Figure 6. Examples of iris, upper eyelash, and lower eyelash regions defined from the iris image: (a) example 1 and (b) example 2. In (a,b), the image on the left shows the iris image detected by subblock-based template matching and the image on the right shows the iris, upper eyelash, and lower eyelash regions defined from the iris image.

Figure 7. Architecture of the proposed LRFID-Net.

Figure 8. Shuffle stage of shallow CNN. In this stage, 1 × 1 point-wise group convolution and 3 × 3 depth-wise convolution are used. In the case of stride = 2, 3 × 3 average pooling is also used.

Figure 9. Original live images and generated fake images with Gaussian filtering. (a) Original live images. Generated fake images using CycleGAN (b) without Gaussian filtering and with (c) 3 × 3 Gaussian filtering, (d) 9 × 9 Gaussian filtering, and (e) 11 × 11 Gaussian filtering.

Figure 10. Training and validation loss graphs of CycleGAN: (a) training and validation loss graphs of the generator; (b) training and validation loss graphs of the discriminator.

Figure 11. The accuracy and loss graphs for training and validation.

Figure 12. ROC curves of LRFID-Net and the state-of-the-art methods: (a) without Gaussian filtering and (b) 11 × 11 Gaussian filtering (EER: equal error rate).

Figure 13. ROC curves of LRFID-Net and the state-of-the-art methods: (a) without Gaussian filtering and (b) 3 × 3 Gaussian filtering (EER: equal error rate).

Figure 14. t-Test results between ACERs for the proposed and second-best methods: (a) LiveDet-Iris-2017-Warsaw database (11 × 11 Gaussian filtering) and (b) LiveDet-Iris-2017-ND database (3 × 3 Gaussian filtering).

Figure 15. Jetson TX2 embedded system.

Figure 16. Examples of correct PAD by proposed method: (a,c) original live images and (b,d) corresponding fake images of (a,c), respectively.

Figure 17. Examples of incorrect PAD (bona fide presentation classification error) by proposed method: (a,c) original live images and (b,d) corresponding fake images of (a,c), respectively.

Figure 18. Examples of incorrect PAD (attack presentation classification error) by the proposed method: (a,c) original live images and (b,d) corresponding fake images of (a,c), respectively.

Figure 19. Examples of representing main feature regions using Grad-CAM: (a) original live image, (b) generated fake image, and (c) generated fake image + 3 × 3 Gaussian filtering.

Table 1. Comparison of state-of-the-art iris PAD methods and the proposed iris PAD method.

Category			Method	Advantage	Disadvantage
Using fabricated artifacts	Using normalized iris region		ContlensNet [10]	Only iris patterns are used for PAD, resulting in high processing speed	Using fabricated artifacts Using unnormalized iris region
	Using normalized iris region		MCNN with logistic regression [11]
	Using unnormalized iris region	Not using attention module	D-NetPAD [12]	Since no process of iris region normalization is required, the algorithm has low complexity and high processing speed
			CNN with 25 overlapping patches [13]
			TripletNet [14]
			DCLNet + SVM [15]
			Score fusion of DCCNet features and handcrafted features [16]
		Using attention module	DFCANet [17]	Enhancing PAD performance by using attention module to give weights to more-important features
	Using normalized and unnormalized iris regions		Feature fusion or score fusion obtained from iris, inner, and outer region, and classification using SVM [18]	PAD training can be performed for various regions
	Using normalized and unnormalized iris regions		Gabor filter and CNN using binary iris code image and score level fusion [19]	PAD training can be performed for various regions
Using generated image	Using unnormalized iris region	Using attention module	CNN using channel attention and position attention module [25]	Using generated image	Using unnormalized iris region
		Not using attention module	Multilevel Haralick and VGG Fusion [21]	Various features can be used for PAD	Limitations in improvement of PAD accuracy due to the use of handcrafted features
			RaSGAN relativistic discriminator [23]	High PAD accuracy by using a discriminator used in training of RaSGAN	Training of RaSGAN is time consuming and the performance result of training is unstable when the number of training samples for the classes is small
			4DCycleGAN + LLC [26]	PAD for images generated by CycleGAN	Low PAD accuracy
			LRFID-Net (proposed method)	High PAD accuracy by considering all features in the local iris region	Requires preprocessing for segmentation into local iris regions

Table 2. Generator of CycleGAN.

Layer		Output Channel	Filter Size	Output Size
Input		-	-	224 × 224 × 3
Padding		-	3 × 3	230 × 230 × 3
Encoder 1	Convolution Instance Normalization	64	7 × 7	224 × 224 × 64
Encoder 2	Convolution Instance Normalization	128	3 × 3	112 × 112 × 128
Encoder 3	Convolution Instance Normalization	256	3 × 3	56 × 56 × 256
Residual Block	$[\begin{matrix} P a d d i n g \\ C o n v o l u t i o n_{1} \\ I n s t a n c e N o r m a l i z a t i o n \\ P a d d i n g \\ C o n v o l u t i o n_{2} \\ I n s t a n c e N o r m a l i z a t i o n \end{matrix}] \times 9$	- 256 - - 256 -	- 3 × 3 - - 3 × 3 -	56 × 56 × 256
Decoder 1	Deconvolution Instance Normalization	128 -	3 × 3 -	112 × 112 × 128
Decoder 2	Deconvolution Instance Normalization	64 -	3 × 3 -	224 × 224 × 64
Decoder 3	Padding Deconvolution	- 3	3 × 3 7 × 7	230 × 230 × 64 224 × 224 × 3

Table 3. Discriminator of CycleGAN.

Layer	Output Channel	Filter Size (Stride)	Output Size
Input	-	-	224 × 224 × 3
Convolution	64	4 × 4 (2)	128 × 128 × 64
Convolution Instance Normalization	128	4 × 4 (2) -	56 × 56 × 128
Convolution Instance Normalization	256	4 × 4 (2) -	28 × 28 × 256
Convolution Instance Normalization	512	4 × 4 -	28 × 28 × 512
Convolution	1	4 × 4	28 × 28 × 1

Table 4. Descriptions of 1st dense blocks.

Layer		Output Channel	Filter Size (Stride)	Output Size
Input		-	-	224 × 224
ZeroPadding2D		-	3 × 3	230 × 230
Convolution		64	3 × 3 (2)	112 × 112
ZeroPadding2D		-	1 × 1	114 × 114
Max Pooling		-	3 × 3 (2)	56 × 56
1st Dense Block	$[\begin{matrix} C o n v o l u t i o n_1 \\ C o n v o l u t i o n_2 \end{matrix}] \times 6$	128 32	1 × 1 3 × 3	56 × 56
Transition Block (1)	Convolution	128	1 × 1	56 × 56

Table 5. Descriptions of proposed shallow CNN.

Layer			Group = 2 Output Channel	Filter Size (Stride)	#Iteration	Output Size
Input			-	-	-	56 × 56
Convolution			24	3 × 3 (2)	-	28 × 28
MaxPooling			24	3 × 3 (2)	-	14 × 14
Stage 1	Unit 1	Group Convolution 1 Average Pooling Channel Shuffle Depth-wise Convolution Group Convolution 2	200	1 × 1 3 × 3 (2) - 3 × 3 (2) 1 × 1	1	7 × 7
Stage 1	Unit 2	Group Convolution 1 Channel Shuffle Depth-wise Convolution Group Convolution 2	200	1 × 1 - 3 × 3 $1 \times 1$	7	7 × 7
Stage 2	Unit 1	Group Convolution 1 Average Pooling Channel Shuffle Depth-wise Convolution Group Convolution 2	400	1 × 1 3 × 3 (2) - 3 × 3 (2) 1 × 1	1	4 × 4
Stage 2	Unit 2	Group Convolution 1 Channel Shuffle Depth-wise Convolution Group Convolution 2	400	1 × 1 - 3 × 3 1 × 1	3	4 × 4
Global Average Pooling			400	-	-	400
FC-Layer			200	-	-	2

Table 6. Detailed description of experimental databases.

Dataset	Group	Number of Class in Live Images	Number of Live Images	Number of Fake Images
LiveDet-Iris-2017-Warsaw	Sub-dataset A	141	2577	3477
LiveDet-Iris-2017-Warsaw	Sub-dataset B	140	2593	3368
LiveDet-Iris-2017-ND	Sub-dataset A	161	2277	1251
LiveDet-Iris-2017-ND	Sub-dataset B	161	2509	1251
LiveDet-Iris-2017-Warsaw live images + fake images generated by CycleGAN	Sub-dataset A	141	2577	2577
	Sub-dataset B	140	2591	2591
LiveDet-Iris-2017-ND live images + fake images generated by CycleGAN	Sub-dataset A	161	2277	2277
	Sub-dataset B	161	2509	2509

Table 7. Hyperparameters used for training of CycleGAN.

Parameters	Value
Epochs	200
Batch size	1
Learning rate	$2 \times 10^{- 4}$
Learning decay	100
Beta 1	0.5
Gradient penalty	None
Adversarial loss	LSGAN
Identity loss weight	0.0
Cycle loss weight	10.0
Gradient penalty weight	10.0
Pool size	50

Table 8. Hyperparameters used for training of LRFID-Net.

Parameters	Value
Batch size	2
Epochs	50
Learning decay	None
Learning rate	$1 \times 10^{- 3}$
Optimizer	Adam
Beta 1	0.9
Beta 2	0.999
Epsilon	$1 \times 10^{- 7}$
Kernel initializer	Glorot_uniform
Bias initializer	Zeros
Loss	Categorical cross entropy

Table 9. Comparisons in image quality evaluation of generated fake images by CycleGAN with the state-of-the-art methods.

Database	Method	FID	WD
LiveDet-Iris-2017-Warsaw	PGGAN [44]	70.82	30.04
	RaSGAN [22]	189.91	33.11
	iDCGAN [20]	176.12	32.69
	Pix2Pix [45]	206.10	30.03
	ACL-GAN [46]	40.21	7.22
	FastGAN [47]	214.56	31.67
	CycleGAN [8] (proposed method)	14.10	7.51
LiveDet-Iris-2017-ND	PGGAN [44]	236.06	33.93
	RaSGAN [22]	260.44	14.53
	iDCGAN [20]	150.28	34.19
	Pix2Pix [45]	273.92	32.22
	ACL-GAN [46]	56.14	13.65
	FastGAN [47]	181.32	37.88
	CycleGAN [8] (proposed method)	33.08	11.69

Table 10. ACERs of PAD according to various backbone models and with or without Gaussian filtering (unit: %).

Model	Fold	With Gaussian filtering
		Without Gaussian Filtering		3 × 3 Filter		9 × 9 Filter		11 × 11 Filter
		Each Fold	Average	Each Fold	Average	Each Fold	Average	Each Fold	Average
DenseNet-169	1-fold	5.55	4.75	3.13	3.14	4.79	3.79	5.14	4.18
DenseNet-169	2-fold	3.94	4.75	3.14	3.14	2.78	3.79	3.22	4.18
ResNet-152	1-fold	7.05	7.10	4.88	4.93	5.15	4.38	5.15	4.47
ResNet-152	2-fold	7.15	7.10	4.98	4.93	3.6	4.38	3.78	4.47
VGG-19	1-fold	10.61	9.89	6.46	5.84	9.88	7.54	9.88	7.54
VGG-19	2-fold	9.17	9.89	5.21	5.84	5.20	7.54	5.20	7.54
XceptionNet	1-fold	9.26	8.75	6.89	6.63	7.09	6.66	7.09	6.96
XceptionNet	2-fold	8.24	8.75	6.36	6.63	6.22	6.66	6.82	6.96

Table 11. ACERs of PAD according to the cases of using each local region, feature-level fusion, score-level fusion, and the proposed LRFID-Net (I, U, and L represent the iris, upper eyelash, and lower eyelash regions, respectively) (unit: %).

Method		Without Gaussian Filtering			11 × 11 Gaussian Filtering
Method		APCER	BPCER	ACER	APCER	BPCER	ACER
DenseNet-169	With only iris region	1.71	1.96	1.84	1.69	0	0.84
	With only upper eyelash region	6.50	23.24	14.87	7.77	0	3.88
	With only lower eyelash region	11.94	9.63	10.78	10.79	0	5.39
Feature-level fusion		1.14	1.42	1.28	1.13	0	0.56
Score-level fusion	Weighted sum	1.62	3.55	2.59	1.62	0	0.81
	Weighted product	1.38	1.49	1.43	1.65	0	0.83
	SVM	1.59	1.94	1.76	1.57	0	0.78
LRFID-Net	Ocular	0.24	0.16	0.20	0.24	14.45	7.34
LRFID-Net	I + U + L (proposed)	0.06	0	0.03	0.06	0	0.03

Table 12. ACERs of PAD by LRFID-Net according to the number of dense blocks and shuffle stages (unit: %).

Number of Dense Blocks	Number of Shuffle Stages	Without Gaussian Filtering			11 × 11 Gaussian Filtering
Number of Dense Blocks	Number of Shuffle Stages	APCER	BPCER	ACER	APCER	BPCER	ACER
1	2	0.06	0	0.03	0.06	0	0.03
2	2	0.08	0.04	0.06	0.08	0	0.04
3	2	2.96	6.04	4.50	2.96	0.46	1.71

Table 13. ACERs of PAD by LRFID-Net according to the cases of using the various combinations of each local region (I, U, and L represent the iris, upper eyelash, and lower eyelash regions, respectively) (unit: %).

Combination of Local Regions	Without Gaussian Filtering			11 × 11 Gaussian Filtering
Combination of Local Regions	APCER	BPCER	ACER	APCER	BPCER	ACER
I + U	1.84	19.60	10.72	1.94	21.08	11.51
I + L	16.09	14.43	15.26	18.65	23.16	20.91
U + L	10.64	25.69	18.16	12.84	33.03	22.93
I + U + L	0.06	0	0.03	0.06	0	0.03

Table 14. ACERs of PAD by LRFID-Net with I + U + L from Table 13 for Gaussian filtering with various filter sizes (unit: %).

LRFID-Net with I + U + L
Gaussian Filter Size	APCER	BPCER	ACER
3 × 3	0.06	0	0.03
9 × 9	0.06	0	0.03
11 × 11	0.06	0	0.03

Table 15. ACERs of PAD by LRFID-Net with I + U + L from Table 14 for various methods of fake image generation (unit: %).

Method	Without Gaussian Filtering			11 × 11 Gaussian Filtering
Method	APCER	BPCER	ACER	APCER	BPCER	ACER
iDCGAN [20]	0.04	0	0.02	0.04	0.10	0.07
RaSGAN [22]	0	0	0	0.04	0.10	0.07
ACL-GAN [46]	0.02	0.01	0.02	0.02	0.01	0.02
FastGAN [47]	0.04	0	0.02	0.02	0	0.02
CycleGAN [8]	0.06	0	0.03	0.06	0	0.03

Table 16. ACERs of PAD by LRFID-Net compared with state-of-the-art methods (unit: %).

Method	Without Gaussian Filtering			11 × 11 Gaussian Filtering
Method	APCER	BPCER	ACER	APCER	BPCER	ACER
D-NetPAD [12]	1.74	1.63	1.68	1.12	1.28	1.20
DCLNet [15]	10.63	12.37	11.50	3.67	11.58	7.62
AG-PAD [25]	6.97	0.46	3.72	6.97	0.64	3.80
ViT [53]	0.12	1.57	0.84	3.02	12.43	7.72
MaxViT [54]	1.43	1.82	1.63	6.85	6.87	6.86
LRFID-Net (Proposed)	0.06	0	0.03	0.06	0	0.03

Table 17. ACERs of PAD according to various backbone models and with and without Gaussian filtering (unit: %).

Model	Fold	Without Gaussian Filtering		With Gaussian Filtering
		Without Gaussian Filtering		3 × 3 Filter		9 × 9 Filter		11 × 11 Filter
		Each Fold	Average	Each Fold	Average	Each Fold	Average	Each Fold	Average
DenseNet169	1-fold	3.72	3.30	2.57	1.65	2.41	1.55	2.41	1.55
DenseNet169	2-fold	2.89	3.30	0.74	1.65	0.70	1.55	0.70	1.55
ResNet152	1-fold	4.45	4.49	2.29	2.54	1.88	2.29	1.88	2.29
ResNet152	2-fold	4.53	4.49	2.80	2.54	2.71	2.29	2.71	2.29
VGG19	1-fold	8.88	7.71	5.72	5.07	5.57	4.99	5.57	4.99
VGG19	2-fold	6.54	7.71	4.43	5.07	4.42	4.99	4.42	4.99
XceptionNet	1-fold	5.52	4.76	3.56	2.90	2.86	2.38	2.96	2.42
XceptionNet	2-fold	4.01	4.76	2.24	2.90	1.89	2.38	1.88	2.42

Table 18. ACERs of PAD for the cases of using each local region, feature-level fusion, score-level fusions, and the proposed LRFID-Net (I, U, and L represent the iris, upper eyelash, and lower eyelash regions, respectively) (unit: %).

Method		Without Gaussian Filtering			3 × 3 Gaussian Filtering
Method		APCER	BPCER	ACER	APCER	BPCER	ACER
DenseNet-169	With only iris region	4.15	6.47	5.31	6.48	9.18	7.83
	With only upper eyelash region	9.46	6.51	7.98	4.35	4.92	4.64
	With only lower eyelash region	19.06	13.03	16.04	18.51	0.1	9.31
Feature-level fusion		3.10	3.35	3.22	4.79	0.16	2.48
Score-level fusion	Weighted sum	3.23	5.08	4.15	4.77	6.25	5.51
	Weighted product	3.23	3.56	3.39	3.79	3.03	3.41
	SVM	3.49	5.72	4.60	4.74	7.09	5.91
LRFID-Net	Ocular	3.71	0.16	1.93	3.71	0.06	1.88
LRFID-Net	I + U + L (proposed)	0.12	0.10	0.11	0.12	0.10	0.11

Table 19. ACERs of PAD by LRFID-Net according to the number of dense blocks and shuffle stages (unit: %).

Number of Dense Blocks	Number of Shuffle Stages	Without Gaussian Filtering			3 × 3 Gaussian Filtering
Number of Dense Blocks	Number of Shuffle Stages	APCER	BPCER	ACER	APCER	BPCER	ACER
1	2	0.12	0.10	0.11	0.12	0.10	0.11
2	2	0.43	0.51	0.47	0.43	0.04	0.24
3	2	9.92	17.36	13.64	9.92	9.31	9.61

Table 20. ACERs of PAD by LRFID-Net for cases in which various combinations of local regions were used (I, U, and L represent the iris, upper eyelash, and lower eyelash regions, respectively) (unit: %).

Combination of Local Regions	Without Gaussian Filtering			3 × 3 Gaussian Filtering
Combination of Local Regions	APCER (%)	BPCER (%)	ACER (%)	APCER (%)	BPCER (%)	ACER (%)
LRFID-Net + I + U	2.29	2.31	2.30	7.88	3.37	5.72
LRFID-Net + I + L	10.92	13.02	11.97	5.07	1.56	3.31
LRFID-Net + U + L	4.30	5.55	4.93	15.58	13.72	14.65
LRFID-Net + I + U + L	0.12	0.10	0.11	0.12	0.10	0.11

Table 21. ACERs of PAD by LRFID-Net with I + U + L of Table 20 for Gaussian filtering with various filter sizes (unit: %).

LRFID-Net with I + U + L
Gaussian Filter Size	APCER (%)	BPCER (%)	ACER (%)
3 × 3	0.12	0.10	0.11
9 × 9	0.12	0.12	0.12
11 × 11	0.12	0.10	0.11

Table 22. ACERs of PAD by LRFID-Net with I + U + L from Table 22 for various methods of fake image generation (unit: %).

Method	Without Gaussian Filtering			3 × 3 Gaussian Filtering
Method	APCER	BPCER	ACER	APCER	BPCER	ACER
iDCGAN [20]	0.07	0	0.04	0.04	0.0	0.02
RaSGAN [22]	0.03	0.02	0.03	0.03	0	0.02
ACL-GAN [46]	0	0	0	0.08	0.08	0.04
FastGAN [47]	0.04	0	0.02	0.04	0	0.02
CycleGAN [8]	0.12	0.10	0.11	0.12	0.10	0.11

Table 23. Comparison of the ACERs of PAD by LRFID-Net with those of the state-of-the-art methods (unit: %).

Method	Without Gaussian Filtering			3 × 3 Gaussian Filtering
Method	APCER	BPCER	ACER	APCER	BPCER	ACER
D-NetPAD [12]	1.69	1.78	1.73	0.90	0.94	0.92
DCLNet [15]	8.76	8.74	8.75	3.65	3.62	3.63
AG-PAD [25]	6.96	6.37	6.67	6.96	22.75	14.86
ViT [53]	0.29	2.47	1.38	0.29	11.12	5.7
MaxViT [54]	0.04	1.32	0.68	0.04	19.81	9.92
LRFID-Net (Proposed)	0.12	0.10	0.11	0.12	0.10	0.11

Table 24. Comparison between the GFLOPs, number of parameters, memory usage, and inference time of LRFID-Net and those of the state-of-the-art methods. G, M, and GB mean Giga, Mega, and Giga bytes, respectively.

Method	GFLOPs (G)	#Parameters (M)	Memory Usage (GB)	Inference time (ms)
D-NetPAD [12]	2.91	6.956	3.36	21.2
DCLNet [15]	2.64	51.838	1.60	7.2
AG-PAD [25]	7.46	22.774	3.60	113.2
ViT [53]	17.58	85.648	3.84	31.2
MaxViT [54]	24.11	118.807	4.96	172.6
LRFID-Net (Proposed)	12.60	4.757	2.96	25.6

Table 25. PSNR values of the fake images according to the PAD results of Figure 16, Figure 17 and Figure 18.

Classification Results		PSNR
Correct		31.22
Incorrect	Bona fide presentation classification error	33.27
Incorrect	Attack presentation classification error	38.77

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kim, J.S.; Lee, Y.W.; Hong, J.S.; Kim, S.G.; Batchuluun, G.; Park, K.R. LRFID-Net: A Local-Region-Based Fake-Iris Detection Network for Fake Iris Images Synthesized by a Generative Adversarial Network. Mathematics 2023, 11, 4160. https://doi.org/10.3390/math11194160

AMA Style

Kim JS, Lee YW, Hong JS, Kim SG, Batchuluun G, Park KR. LRFID-Net: A Local-Region-Based Fake-Iris Detection Network for Fake Iris Images Synthesized by a Generative Adversarial Network. Mathematics. 2023; 11(19):4160. https://doi.org/10.3390/math11194160

Chicago/Turabian Style

Kim, Jung Soo, Young Won Lee, Jin Seong Hong, Seung Gu Kim, Ganbayar Batchuluun, and Kang Ryoung Park. 2023. "LRFID-Net: A Local-Region-Based Fake-Iris Detection Network for Fake Iris Images Synthesized by a Generative Adversarial Network" Mathematics 11, no. 19: 4160. https://doi.org/10.3390/math11194160

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

LRFID-Net: A Local-Region-Based Fake-Iris Detection Network for Fake Iris Images Synthesized by a Generative Adversarial Network

Abstract

1. Introduction

2. Related Works

2.1. PAD Using Fabricated Artifacts

2.2. PAD Using Generated Fake Image

3. Proposed Method

3.1. Overall Process of Proposed Method

3.2. Generating Training Dataset of Fake Images using CycleGAN

3.3. Iris Detection and Definitions of Iris, Upper Eyelash, and Lower Eyelash Regions

3.4. Proposed LRFID-Net Model

4. Experimental Results

4.1. Experimental Databases and Setups

4.2. Training

4.2.1. Training of CycleGAN for Generating Fake Images

4.2.2. Training of LRFID-Net for PAD

4.3. Testing of Proposed Method

4.3.1. Evaluation Metric

4.3.2. Performance Evaluation of Image Quality

4.3.3. Performance Evaluation of PAD with LiveDet-Iris-2017-Warsaw

Ablation Study

Comparisons with the State-of-the-Art Methods

4.3.4. Performance Evaluation of PAD with LiveDet-Iris-2017-ND

Ablation Study

Comparisons with the State-of-the-Art Methods

4.4. Statistical Analysis

4.5. Processing Time

4.6. Comparisons of Processing Complexity Using Our Method and the State-of-the-Art Methods

4.7. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI