Contactless Palm Vein Recognition Based on Attention-Gated Residual U-Net and ECA-ResNet

Htet, Aung Si Min; Lee, Hyo Jong

doi:10.3390/app13116363

Open AccessArticle

Contactless Palm Vein Recognition Based on Attention-Gated Residual U-Net and ECA-ResNet

by

Aung Si Min Htet

and

Hyo Jong Lee

^*

Division of Computer Science and Engineering, CAIIT, Jeonbuk National University, Jeonju 54896, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(11), 6363; https://doi.org/10.3390/app13116363

Submission received: 25 April 2023 / Revised: 15 May 2023 / Accepted: 16 May 2023 / Published: 23 May 2023

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Palm vein recognition has received some considerable attention regarding its use in biometric identification. Palm vein characteristics offer a superior level of security and reliability in personal identification compared to extrinsic methods such as fingerprint, face, and palm print recognition, as vein patterns are difficult to duplicate and do not change throughout one’s lifetime. This study proposes both segmentation and recognition methods to enhance the authentication performance and achieve correct identification using palm vein features. First, we propose a segmentation model based on the U-Net model, enhanced with an attention gate, to effectively segment palm vein patterns. The incorporation of both the attention gate and residual block allows the segmentation model for the learning of the essential features required for specific segmentation tasks. The Hessian-based Jerman filtering method is used for ground-truth labeling. The segmentation model extracts the palm vein patterns and filters out the irrelevant and noisy pixels for the purpose of recognition. The efficient channel attention residual network is trained to learn discriminative features for personal identification using combined margin-based loss functions for palm vein recognition. The channel attention module enhances the useful information and suppresses irrelevant features in the feature maps, which overcomes the problem of rotation, position translation, and scale transformation, as well as improves the recognition rate. The combined loss function used in this study increases the similarity between the intra-class samples and the diversity between inter-class samples. The proposed recognition model achieved 100% accuracy for palm vein recognition and an equal error rate of 0.018 for palm vein verification.

Keywords:

biometrics authentication; palm vein recognition; attention gate; efficient channel attention

1. Introduction

Biometric identification is an authentication process leveraging unique physical or behavioral human features. During the COVID-19 pandemic, various researchers showed substantially increased interest in contactless biometric identification over contact-based biometrics, such as fingerprints [1]; this is because, in contact-based methods, users need to put their fingers directly on a sensor, which is not practical for health reasons. Contactless biometric systems such as those using the iris [2], face [3], or palm print [4,5], are considered to be better functional recognition systems for this reason. However, these extrinsic biometrics features are susceptible to spoofing and can be significantly impacted by various factors such as the age and health of the person and the condition of the skin or injuries. On the other hand, a contactless biometrics feature such as palm vein is an intrinsic biometrics trait and offers several advantages: first, it provides better privacy and security advantages, as obtaining the palm vein is only possible using special equipment, thus preventing forging. Second, palm vein patterns remain stable throughout an individual’s lifespan and disappear in the absence of blood flow. Third, human palms include highly complicated and unique vein patterns, as the structure of the pattern differs even between identical twins. Moreover, the collection process of contactless palm images is comfortable, easily accepted by users, and provides better hygiene, as there is no need for any interaction between the user’s hand and a sensor on a public device. This complexity and consistency make palm veins an exceptionally reliable biometric feature for personal identification, surpassing other external features. Palm veins are often collected via the reflection method. In this approach, near-infrared light is transmitted from the sensor through the person’s hand. Because the hemoglobin in a vein can absorb more near-infrared radiation passing through a hand than the surrounding tissues, the palm vein patterns appear dark. This reflective approach allows for contactless pattern recognition and user identification. In conclusion, palm vein authentication is considered to be more secure and practical due to privacy concerns, and it, therefore, offers excellent research potential and broad application prospects.

Recently proposed approaches for palm vein recognition generally face the problem of a quality issue in the data collection stage. The acquisition of contactless palm images using near-infrared light often results in poor contrast between the vein pixels and non-vein areas. The visibility of the palm vein can be compromised by various factors, including surrounding temperature, lighting conditions, and illumination. This causes noise and optical blurring in the palm images, which results in degraded recognition accuracy. Enhancing or segmenting the accurate vein patterns is a crucial aspect of improving the robustness of palm vein features; however, it is difficult to extract blood vessels and effectively remove the optical blurring. Primarily, this makes it more challenging to extract the precise vein pattern using distribution assumption-based hand-crafted methods [6,7]. Moreover, the contactless palm vein biometrics system suffers from the problem of image rotation, position translation, and scale transformation. Last but not least, most of the deep-learning-based palm vein recognition studies have to this point focused on a classification approach using the SoftMax loss, a popular loss function for recognition tasks. Although SoftMax loss offers excellent performance for inter-class separation, it is ineffective in minimizing intra-class diversity, which decreases the discriminative features [8,9].

The attention-gated residual U-Net addresses the first problem. The U-Net-based segmentation model segments the near-accurate palm vein vessels from the original grayscale images. The segmentation model filters out the irrelevant and noisy pixels for recognition. To mitigate this challenge and attain more accurate segmentation outcomes with minimal computation, we integrated an attention gate mechanism with the U-Net model. Inspired by the residual U-Net [10], the segmentation model introduces residual learning into each convolutional block. The efficient channel attention residual network (ECA-ResNet) solves the second problem by enhancing the channel-wise information in the feature maps as well as suppressing useless features, which overcomes the problem of rotation, position translation, and scale transformation. This recognition model uses ResNet as the backbone for solving the vanishing or exploding gradients’ problem where the repeated multiplication in the back-propagation may cause the gradient to be infinitely small. The combined loss function proposed in this paper addresses the issue of inter-class similarity and intra-class compactness for palm vein recognition systems, where only a few samples are available for each class.

The main contributions of this work can be summarized in four points:

We propose a deep-learning-based approach for both palm vein segmentation and recognition with promising identification and verification results;
With the advantages of the attention gate, we propose an attention-aware residual U-Net segmentation model to learn domain-specific features such as vein vascular structure from low contrast and blurry palm images, allowing for more precise authentication performance;
We propose a light-weight attention-aware feature extractor for palm vein recognition that can efficiently extract palm vein features without any extra computational overhead;
We propose the most effective loss function for palm vein discriminative learning by combining state-of-the-art loss functions, such as ArcFace, focal loss, and triplet loss.

The rest of this paper is systematized as follows. Section 2 discusses the related work. Section 3 presents the proposed methodology for palm vein recognition. Section 4 and Section 5 describe the experiments and results. Finally, Section 6 consists of the conclusion.

2. Related Work

2.1. Handcrafted Methods

The different handcrafted approaches used for palm vein recognition can be categorized into four groups: geometric-based methods, statistical-based methods, local invariant-based methods, and subspace-based methods.

Geometric-based methods, which use geometric elements such as points, lines, or curves, have been studied as methods with which to extract the palm vessel [6,11,12]. However, due to problems such as external light conditions, low contrast, skin scattering, and optical blurring, which generally occur in contactless palm vein images, and which cause difficulties for the extraction of precise vein segmentation. Moreover, the geometry-based methods offer low discrimination, as they suffer directly from rotation, scaling, and translation.

Statistical-based methods, such as local binary patterns (LBP) [13,14,15,16], modified local binary patterns [17,18], local texture patterns [19], local tetra patterns [20], and a local directional texture pattern (LDTP) [21,22], are used to extract the rich texture-based features of blood vessels. Because the pixel-to-pixel image processing technique was used, the aforementioned methods suffered from weak texture and are very vulnerable to image noise as well as sensitive to image rotation and shifting caused by the displaced hands of users. Therefore, they often achieve a low identification rate. Moreover, encoding the palm vessel structure also degrades the feature representation of the local binary pattern.

Local invariant-based methods [23,24,25,26,27], such as the scale invariant feature transform (SIFT), speeded-up robust features (SURF), and RootSIFT, can overcome such problems of scale uncertainty, orientation, and translation. Thus, this approach can be considered a competitive handcrafted feature extraction method. However, this results in a lengthy computational time and incorrect verification due to the existence of unstable feature points caused by external factors in low-quality palm images.

Subspace-based approaches [28,29,30,31,32,33,34] are also proposed to reduce the dimensionality of training data to a lower-dimensional space. These methods include principal component analysis (PCA), linear discriminative analysis (LDA), Fisher linear discriminant (FLD), and independent component analysis (ICA). Because of the manual feature extraction and the fact that non-training-based methods are selected, these feature-based systems are generally time-consuming and error-prone.

2.2. Deep Learning Methods

Obayya et al. [35] proposed a CNN architecture for palm vein recognition using Bayesian optimization. The recognition accuracy of the CNN model is very high, but its error rate is still high for verification problems. This approach applies the Jerman filtering method to raw ROIs at different scales to enhance the palm vein images. The maximum filter responses from different scales are considered to be the final output, and they produce complex palm vein structures that may differ from the actual vein pattern. Moreover, using handcrafted vein enhancement at different scales for each ROI image is time-consuming and not applicable in real cases. Pan et al. [36] proposed a multi-scale deep representation approach for palm vein recognition. Due to the limitation of the training database on a small scale, training a deep convolutional neural network becomes challenging since it substantially relies on the amount of training data. The present study proposed the use of multi-scale deep representation aggregation to remove the noisy features from the pretrained CNN and refine the feature maps using a local mean threshold approach.

Wu et al. [37] proposed the wavelet denoising ResNet. The proposed wavelet denoising (WD) model removes noise and optical blurring from palm vein images by enhancing the low-frequency feature. The network is composed of both the ResNet-18 model and the squeeze-and-excitation module to achieve better performance. However, there is a trade-off between performance and complexity, where fully connected layers in the excitation path of the SE module increase the complexity of the model. This paper utilized the wavelet denoising technique as a sub-band to enhance the low-frequency feature that fuses with a deep learning network using a residual connection. The proposed WD model helps remove the image noise caused by skin scattering and optical blurring in the high-frequency part. However, this approach requires greater practical enhancement for lower contrast and blurry palm vein images. Pan et al. [38] extracted semantic palm vein features using multi-layer convolutional feature concatenation. Chen et al. [39] proposed a lightweight CNN and adaptive augmentation method for palm vein authentication. Categorical cross-entropy loss has been widely proposed in various approaches, but these have lacked discriminative separation between intra-class and inter-class samples.

Moreover, only a few studies have focused on the segmentation problem. Felix et al. [40] and Wang et al. [41] both proposed palm vein segmentation models based on the U-Net architecture. However, the experimental results were not satisfactory due to the poor feature representation in the initial layers that were used in skip-connection, which may cause redundant low-level feature extractions. Similar to the U-Net architecture, PVSNet [42] proposed a novel Siamese method using triplet loss and an adaptive hard mining technique. The pretrained model is composed of an encoder and a decoder for learning enforced palm vein features. Positive and negative samples are separated using triplet loss, which is a popular loss function introduced by FaceNet [3]. However, the triplet loss function still has certain drawbacks and limitations, which negatively impact the model accuracy rate due to the triplet selection during the training stage.

3. Attention-Gated Residual U-Net and ECA-ResNet

Figure 1 shows the overall flowchart of the proposed palm vein recognition system. First, the ROI extraction method is used to locate the area that is to be identified. The U-Net-based segmentation model removes redundant noisy information and enhances domain-specific features that are important for recognition. However, the ground-truth label data needed to train the U-Net model are not provided by any of the available palm vein databases. Therefore, we used the handcrafted method to label the palm vessel. Once the segmentation model becomes stable after optimization, the model’s hyperparameters were frozen, and the segmentation output was connected to the ECA-ResNet for further training for identification. The margin-based ArcFace loss, focal loss, and triplet loss functions were applied to train the ECA-ResNet. For testing and evaluating the model, the output of ECA-ResNet can be used two-fold: First, the 512-feature embedding can be used for a distance comparison between the registered template and a query using Euclidean distance or cosine similarity metrics. Secondly, we can obtain the SoftMax probability prediction from the binary head of ECA-ResNet. The following sections detail each step separately.

3.1. Region of Interest Extraction

Similarly to the face recognition system, only some of the information acquired by the near-infrared radiation (NIR) camera is necessary, and region of interest (ROI) extraction or detection is an essential aspect of improving the model performance. This step also improves the palm vein recognition system in terms of computational efficiency by reducing the template size. Despite having advantages such as better hygiene and a user-friendly approach, contactless systems produce inconsistent results, including hand displacement, rotation, and zooming at different degrees. Therefore, this study uses a reliable background subtraction and ROI extraction process as the preprocessing method to overcome these problems. The segmentation process involves five steps: segmenting the hand from the background using the binarization method, locating hand contours, centroid positioning and detecting key points, normalization, and ROI extraction.

First, this study used Gaussian blur to remove the image noise caused by external light factors. The contactless palm images obtained generally contain the entire hand with a darker background due to the variations in NIR light response. Therefore, the hand contour information can be segmented using the Otsu binarization method. Occasionally, due to the influence of lighting conditions—such as the brightness of some areas of the background—being similar to that of the segmented hand, the output of the Otsu binarization process may contain a couple of smaller, irregularly shaped areas that are also segmented besides the segmented hand. These areas can introduce noise and compromise the accuracy of the subsequent process. Thus, the morphological operations such as erosion and dilation are utilized to eliminate these irregular areas and obtain better segmentation results. After obtaining hand contours from binarization, the point C is positioned in the center to compute the radial distance function (RDF). The process of RDF is performed by calculating the distance between each point on the contour of the hand to the centroid C, which is a measure of the radial distance for each point. Once the radial distances for all points on the contour is calculated, the maxima and minima points can be determined. The maxima typically correspond to the fingertips, as these are the points furthest away from C. The minima, on the other hand, correspond to the finger valleys, as these are the points closest to C. This can be achieved by finding the local maxima and minima in the RDF curve, as shown in Figure 2. From the obtained five maxima finger tips, the thumb, which is identified as the fingertip with the smallest radial distance from the C, is excluded to simplify the process. Finally, four maxima fingertips and three corresponding minima finger valleys are obtained, as shown in Figure 3c. From these points, the leftmost and rightmost valley points can be, respectively, defined as

P 1

and

P 2

, respectively. Because the normalization of the hand position is a fundamental step for identification, we rotationally normalized the images by horizontally aligning them. The normalizing angle

θ

was then computed as the angle between d, which is shown in Equation (1), and the dotted red line, as shown in Figure 3d. The hand images are then zoomed and rotated according to

θ

in Equation (2) by using bilinear interpolation, as depicted in Figure 3e.

d = \sqrt{{(Y_{P 2} - Y_{P 1})}^{2} - {(X_{P 2} - X_{P 1})}^{2}}

(1)

θ = {tan}^{- 1} ((Y_{P 2} - Y_{P 1}) / (X_{P 2} - X_{P 1}))

(2)

Most existing methods solely rely only on

P 1

and

P 2

for ROI extraction. However, such an approach is considered to be error-prone due to the diversity of hand shapes where part of the selected ROI exists outside the actual boundary points. Thus, this study adaptively obtained each ROI image’s size and location. First, from the four fingertip points obtained, the left-most and right-most ones are marked as

T 1

and

T 2

, respectively. After that, two boundary points called

P 3

and

P 4

are obtained from the outer boundary points, followed by locating two reliable points called

E 1

and

E 2

, as presented in Figure 3e.

P 3

exists on the hand-contour at the equal boundary distance between that of

T 1

and

P 1

. Similarly,

P 4

is placed at the equal boundary distance as

T 2

and

P 2

.

E 1

and

E 2

are then located at the midpoint between

P 3

and

P 1

and the midpoint between

P 4

and

P 2

, respectively. The ROI image is directly extracted from

E 1

and

E 2

.

3.2. Palm Vein Segmentation

Palm vein segmentation is the process of extracting vascular vein patterns from noisy and low-contrast palm vein images. This section describes the ground-truth preparation method, the detailed network architecture of the attention-gated residual U-Net, and the loss functions used in this process.

3.2.1. Ground-Truth Labeling

Currently, there is no existing database that includes annotated labels for training and evaluating palm vein segmentation algorithms. As a result, we had to manually generate the ground-truth label for palm vein segmentation by employing an already-existing handcrafted vein enhancement algorithm. This approach allowed us to create a reliable and accurate set of labeled data for use in the training and evaluation of our algorithm. In this study, the Jerman filtering method [43] was used to measure the eigenvalues of the Hessian, the results of which are almost accurate and similar to the actual palm vein based on a careful selection of parameters such as the kernel size. The Jerman filtering method is widely used to strengthen the intensity of the vessels in contrast to non-vein areas. Denoting

I (X)

as the 2-D input at coordinate

X = [x 1, x 2]

, the Hessian value of

I (X)

at X and scale s is then represented as Equation (3):

H_{ij} (X, s) = s^{2} I (X) * \frac{\partial^{2}}{\partial x_{i} \partial x_{j}} G (X, s) for i, j = 1, 2,

(3)

where G is a bivariate Gaussian filter. Then, the Jerman filtering algorithm is computed as Equation (4):

v = \{\begin{matrix} 0 & if λ_{2} \leq 0 \lor λ_{ρ} \leq 0, \\ 1 & if λ_{2} \geq λ_{ρ} / 2 > 0, \\ λ_{2}^{2} (λ_{ρ} - λ_{2}) {(\frac{3}{λ_{2} + λ_{ρ}})}^{3} & otherwise \end{matrix}

(4)

where:

λ_{p} = \{\begin{matrix} λ_{3} & if λ_{3} > τ max_{x} λ_{3} (X, s), \\ τ max_{x} λ_{3} (X, s) & if 0 < λ_{3} \leq τ max_{x} λ_{3} (X, s), \\ 0 & otherwise \end{matrix}

(5)

The kernel size for the Gaussian size and the parameter value

τ

is the crucial factor when using a handcrafted method such as the Jerman filter, as it can result in different responses ranging from a simple vessel output to complex and sensitive textures, which is also the practical reason behind the need for deep-learning-based segmentation. Depending on the variations in the light, texture, size, and shape of collected vein images via different devices and setups, the arbitrarily chosen parameters of the algorithm may become inconsistent and lack versatility. The image processing method called Jerman is used for extracting the vascular shape palm vessel from the ROI in the manner shown in Figure 4b. This method has involved experimenting with different Gaussian kernels, and a kernel size of three is selected as the optimal value for extracting, as smaller kernels cause the palm image to be more responsive to image noise, and the larger kernels can create an unnecessary and complicated texture in the image. It is crucial to obtain near-accurate ground-truth labels for palm vein segmentation. The extracted images are then applied with a specific threshold for removing the weak and unclear responses, and the pixel values are set to 0 for the palm vessel and 255 for the background in a proper labeling.

3.2.2. Attention Mechanism

We incorporated the attention gate mechanism, as referenced in [44], into our segmentation model. This mechanism is applied after a series of convolutions, serving as the enhancement module diminishing irrelevant regions and amplifying crucial features. The combination of the attention mechanism and U-Net model in the skip connections can achieve better results while maintaining a minimal computation. The detailed architecture of the attention mechanism is shown in Figure 5.

g_{i}

is the gating signal that provides contextual information that can be used to define focus areas for a given feature map

x_{i}

. The attention coefficient,

α

, detects the salient regions while refining feature responses to retain only the activation that is relevant for segmentation. Finally, the final feature map is obtained by multiplying

x_{i}

and

α

element-wise, which can be defined as Equation (6):

x_{o u t} = x_{i} * α

(6)

To compute the gating coefficient

α

, additive attention is utilized for achieving accurate segmentation results. The additive attention is defined as Equation (7):

α = σ_{2} (ψ^{T} (σ_{1} (W_{x}^{T} x_{i} + W_{g}^{T} g_{i} + b_{g})) + b_{ψ})

(7)

where

σ_{1}

and

σ_{2}

are ReLU and Sigmoid functions, respectively,

W_{x}, W_{g}

, and

ψ

are all linear transformations, and

b_{g}

and

b_{ψ}

are biases.

3.2.3. Residual Units

In many multi-layer neural network models, the number of deep layers is increased to improve the model performance. However, this impedes training and may cause a degradation problem [45]. Several existing studies utilize the residual neural network to solve the degradation problem, which eases training and alleviates the degradation issue. The residual neural network is composed of layered residual units. Adding residual units to the network allows one to more efficiently train of the model, and the skip connections within each residual unit between the low and high levels will cultivate the back-propagation process without degradation. Moreover, the model can be designed with fewer parameters while maintaining comparative performance on the specific task. Each residual unit is described as Equations (8) and (9):

y_{l} = h (x_{l}) + F (x_{l}, w_{l})

(8)

x_{l + 1} = activation (y_{l})

(9)

where

x_{l}

and

x_{l + 1}

are the input and output features of

l th

residual block, F represents residual function, and

h (x_{l})

is the identity mapping function, which was also stated as

h (x_{l}) = x_{l}

.

3.2.4. Proposed Attention-Gated Residual U-Net

U-Net [46] demonstrated remarkable success in medical image analysis. The architecture comprises a down-sampling path and an up-sampling path, followed by a skip connection. The proposed attention-aware residual U-Net model’s network architecture is illustrated in Figure 6. In the U-Net model, using a pooling layer during the down-sampling process might result in the loss of certain image features. Additionally, the U-Net model employs skip connections to concatenate low-level features with high-level features, which can cause losing spatial details as low-level features often lacking spatial information.

We have incorporated the attention mechanism into the U-Net model by addressing these challenges. This addition helps suppress irrelevant feature responses and provides a more precise segmentation result. Moreover, deep neural networks often encounter the vanishing gradient problem, where the gradient shrinks infinitely due to repeated multiplication during backpropagation. To overcome this issue, we replaced the convolution process at each level of the encoder and decoder networks with the residual block. The combination of residual blocks and attention mechanism in the skip connection has enhanced the segmentation results of our proposed method, surpassing the baseline methods’ performance.

The contracting path of the proposed model is composed of several residual units. The residual unit includes two 3 × 3 convolution blocks and an identity mapping, where each convolution block contains a batch normalization layer, a ReLU activation layer, and a convolutional layer. The identity mapping connects the input and output of the residual unit. In the up-sampling part, the features obtained from the previous residual unit are tuned by computing the attention response using the lower-level features as the attention gate. The tuned output is then concatenated with up-sampled feature maps. This process is continued for each up-sampling stage, and the segmented output image is reconstructed in the final layer. With the benefits offered by the attention gate, the model can correctly predict the vascular shape of vein patterns.

3.2.5. Loss Function

The primary function of segmentation is to classify each pixel in terms of the specified output. Therefore, cross-entropy loss, which is the popular loss function among several classification problems, is often used to classify pixels. However, unlike character/text recognition in which the text usually occupies a relatively larger portion of the image, vein pixels only relate to the narrower region of the image than the background and occupy a smaller portion of the image compared to the background, which can lead to a class imbalance problem. This imbalance can cause the traditional loss function such as cross-entropy loss to be biased towards the majority class (background), affecting the model’s performance. Hence, dice loss, which is more suitable for handling such imbalances and not plagued by the ratio of foreground pixels to background pixels, is utilized in this research. Calculating the dice coefficient function solves the imbalances between foreground and background, but it ignores another imbalance between easy and difficult instances. Dice loss (DL) is formalized as Equation (10):

D L (p, g) = 1 - \frac{2 p g + 1}{p + g + 1}

(10)

where p and g represent pairs of corresponding pixel values of prediction and ground truth, respectively.

3.3. Palm Vein Recognition

Palm vein recognition is the method of identifying and verifying identities using palm vein templates. This section describes in detail the architecture of the proposed ECA-ResNet-50.

3.3.1. Efficient Channel Attention

Inspired by the SENet architecture [47], the ECA module [48] is proposed to enhance the performance of the CNN model by highlighting valuable information in the feature maps and suppressing irrelevant features. Figure 7 illustrates ECA module. Each feature map

F \in R^{(W \times H \times C)}

is decomposed to a 1 × 1 dimensional space which results in the feature

F_{avg} \in R^{(1 \times 1 \times C)}

to utilize the global average pool (GAP); this is defined as Equation (11):

G A P (F) = \frac{1}{W \times H} \sum_{i = 1, j = 1}^{W, H} F_{i, j}

(11)

ECA introduced local cross-channel interaction (local CCI) to solve the problem of the computational overhead of the cross-channel interaction in SENet. Local CCI enables the computational overhead at a considerably lower cost while allowing each channel in a small local group to be interdependent of every other channel. First, the global parametric space described by C × C is decomposed into a smaller localized space defined by k × C, where k is the pre-defined size of the local region, such that

k < C

. Therefore, attention based on the local CCI may thus be represented as Equation (12):

w_{i}^{'} = σ (\sum_{j = 1}^{k} w_{i}^{j} y_{i}^{j}) \in Ω_{i}^{k}

(12)

where

Ω_{i}^{k}

is a set of k adjacent channels of channel

y_{i}^{j} \in Ω_{i}^{k}

,

σ

represents a sigmoid activation function. However, the parametric space overhead caused by the attention mechanism

k

× C can be reduced by having the channels share the same learning weights as shown in Equation (13), which decrease from

k

× C to

k

, which is relatively small.

w_{i} = σ (\sum_{j = 1}^{k} w^{j} y_{i}^{j}) \in Ω_{i}^{k}

(13)

Moreover, the shared local cross-channel interaction described above is achieved using a 1D convolution kernel with

k

layers. By doing so, ECA can be expressed as Equation (14):

w = σ ({Conv}_{k}^{1 d} (y))

(14)

3.3.2. Proposed ECA-ResNet

Feature extraction is an essential process in biometrics recognition, which aims to extract key features from the input image. At this stage, to improve the prediction accuracy and generalization ability, the modified CNN based on ECA and ResNet-50 was proposed as the feature extractor to produce 512 feature embedding vectors for classification and metric learning. The ECA module is introduced into the ResNet-50 to enable the models to learn the channel-wise information in the feature maps in an adaptive and efficient manner without computational overhead. The modified ResNet-50 model solves the problem of the vanishing gradient or the degradation problem where the repeated multiplication in the back-propagation may cause the gradient to be infinitely small. After the residual blocks, the average pooling operation is applied, and 512-D feature embedding is separately fed into the binary head and margin head for the purpose of multi-task training.

Figure 8 shows the overall proposed network architecture. The modified ResNet-50 architecture, which consists of convolution layers, pooling layers, and an efficient channel attention module, is used as the backbone structure to extract palm vein features. The proposed network model takes segmented palm vein images from prior U-Net models as input with a 1 × 112 × 112 size. The network backbone structure comprises four convolutional layer block stages, consisting of 3, 4, 6, and 3, and the feature maps are 64 × 56 × 56, 128 × 28 × 28, 256 × 14 × 14, and 512 × 7 × 7 each. The ECA module is applied after two convolutions on each residual block to enhance the critical features while suppressing irrelevant features. The ECA module contains a series of global average pooling, a 1 × 1 convolution, and a sigmoid activation function, the task of which is to compute the attention weights and refine the feature map to form

F_{o}

. This operation is repeated until the last convolutional block, followed by an average pooling operation and batch normalization to form the 512-feature embeddings.

3.3.3. Loss Function

As a proposed recognition model is trained with the combined loss functions, the following layers after the feature embedding output are set in a multi-task trainable manner for each loss function. First, the feature embedding is used to calculate triplet loss [49] by computing the Euclidean distances between the pairs of genuine and imposter palm samples. To boost the triplet loss to learn better generalization features for hard negative samples, a hard-mining strategy is used before computing the triplet loss function. After that, the embedding outputs are separated into the binary head and margin head separately, where the aim is for each head to be simultaneously trained with focal loss and ArcFace loss. In the margin head, an angular margin

m

is added to the targets and multiplied by the feature scale

s

to compute ArcFace loss. By combing triplet loss, ArcFace loss, and focal loss, the network learned to separate hard samples with high discriminative features for identification and verification. The triplet loss in our experiment is defined as Equation (15):

L_{Triplet} = max (0, M + ∥f (x_{a}) - f (x_{p})∥ - ∥f (x_{a}) - f (x_{h n})∥)

(15)

where a represents the anchor sample, p represents the positive sample,

h n

represents the hard negative sample, and

margin M

stands for margin.

The ArcFace loss [8], which is based on modifications in the SoftMax loss function, is used to obtain discriminative embedding features for palm vein samples. An additive angular margin penalty is applied to tighten the distances between intra-class samples and boost inter-class diversity. The margin penalty also provides precise correspondence to the geodesic distance. The ArcFace loss is formulated as Equation (16).

L_{ArcFace} = - \frac{1}{N} \sum_{i = 1}^{N} log \frac{e^{s (cos (θ_{y_{i}} + m))}}{e^{s (cos (θ_{y_{i}} + m))} + \sum_{j = 1, j \neq y_{i}}^{n} e^{s cos θ_{j}}}

(16)

where

θ_{j}

is the angle between the feature vector and the weight vector,

s

is the scaling factor for the feature vectors,

m

represents the penalty imposed on the angular margin, and

N

and

n

stand for the size of the batch being processed and the total number of classes, respectively.

Focal loss [50], which is a cross-entropy loss that is dynamically scaled with a scaling factor, was proposed to solve the class imbalance problem in binary classification. To punish hard-to-classify classes more severely during training, the scaling factor down-weighs the easy-to-classify samples. In this study, the focal loss is modified to apply to multi-class classification, as shown in Equation (17).

L_{Focal} = - \sum_{C}^{i = n} {(1 - y_{i})}^{γ} log (y_{i})

(17)

where C denotes the number of categories,

y_{i}

denotes a probability distribution of the prediction, and

γ

is the focusing parameter that controls the degree to which the loss function focuses. When

γ > 0

, the focal loss will assign more weight to hard-to-classify classes and less weight to easy-to-classify samples.

Thus, ArcFace loss broadens the inter-class margins with the margin penalty, while hard triplet loss enhances the intra-class compactness. Meanwhile, focal loss is incorporated with triplet loss and ArcFace loss to focus more on hard-to-classify palm vein samples. The combination of these losses allows our deep recognition model to learn more specific discriminative features for palm vein recognition.

4. Experiments

4.1. Dataset

The CASIA Multi-Spectral Palmprint Image Database [51] is a public palm print and palm vein dataset that is available for research. It consists of 7200 samples collected from 100 different people. When collecting the data, two sessions were used to take palm pictures for each hand. The gap between the two sessions was more than a month. As there were three examples in each session, each sample had six palm photos captured simultaneously with six distinct electromagnetic spectrums (460 nm, 630 nm, 700 nm, 850 nm, 940 nm, and white light). Some variations in hand postures between two samples are also performed for increasing the diversity of intra-class samples and simulating real-world applications.

Since each person provides palm samples for the left hand and right hand separately, and the palm patterns of both hands are different from each other, each hand is considered as one identity, meaning that two hands from 100 different persons lead to 200 different identities in this study. Moreover, the palm vein appears vividly under NIR illumination, and only the samples captured with 850 nm and 940 nm are selected, as there is no vein information in the images that are captured in white light, and spectra under 850 nm produce unclear vein images. Thus, from a total of 7200 images from the database, 2400 trainable images are obtained.

4.2. Experimental Setup

Initially, the palm images are normalized and aligned to a vertical orientation, which is required for calculating the RDF, and resized to 112 × 112 in the ROI extraction step. This orientation change introduces a minor impact as when presented with horizontally oriented palm images, a few adaptations may be necessary, such as adjusting the ROI extraction to account for changes along the x axis. However, it is critical to note that once the ROI images were identified and extracted, orientation changes do not significantly impact the performance of the U-Net segmentation model as the model can overcome the problem of rotation and scale transformation. When training ECA-ResNet-50, data augmentation methods such as random cropping and random rotation (0.6) are performed. To optimize the network, an Adam optimizer with the weight decays of 2 ×

10^{- 4}

is used. Regarding the learning rate, an adaptive learning rate strategy is used to update the learning rate from 1 ×

10^{- 4}

at the initial epoch to 1 ×

10^{- 5}

at the max epoch. The network is trained for a total of 60 epochs for the CASIA dataset. The training for the segmentation model took approximately 3 h, and the training for recognition only required approximately 2 h. In terms of models’ complexity and size, the attention-gated residual U-Net segmentation model comprises 2.4 M parameters with a model size of 9.2 MB. In contrast, the ECA-ResNet recognition model contains 26.9 M parameters with a model size of 98 MB. In our experiments, the average computational time per single palm image through the entire pipeline was approximately 40 milliseconds. This rapid inference time ensures a better user experience, especially in applications where instant identification is required. All experiments are simultaneously run in parallel on four NVIDIA TitanxX GPUs (12 GB of RAM per GPU) with the computation from several GPUs.

4.3. Evaluation Metrics

This section describes the primary evaluation metrics used for both the segmentation and authentication tasks. First, to assess the performance of palm vein segmentation using the attention-gated residual U-Net model, both the intersection over union (IoU) and dice coefficient are used for evaluation. For biometric authentication, the accuracy of the model is simply not sufficient to be used as an evaluation method. Thus, to evaluate the proposed ECA-ResNet-50, the evaluation metrics of identification accuracy, precision, recall, F1 score, and equal error rate (EER) are used.

4.3.1. IoU Coefficient

The IoU coefficient, which is commonly known as the Jaccard index, is a popular method for evaluating our segmentation model performance. The IoU coefficient, which is used to calculate the percentage overlap between the ground-truth mask pixels and the prediction output pixels, can be defined as Equation (18):

IoU (p, g) = \frac{| p \cap g |}{| p \cup g |}

(18)

where p and g denote the pixel values of the prediction result and label, respectively.

4.3.2. Dice Similarity Coefficient

In computer vision tasks, the dice similarity coefficient (DSC) is widely used to measure the distance between the output of the segmentation and the respective label. DSC can be defined as Equation (19):

DSC (p, g) = \frac{2 | p \cap g |}{| p | + | g |}

(19)

where p and g denote the pixel values of the prediction result and label, respectively.

4.3.3. Identification Accuracy

In the process of biometrics identification, which seeks to identify to whom the palm template belongs, the percentage of correctly categorized samples is computed to obtain identification accuracy, as shown in Equation (20):

Accuracy = 1 - \frac{Number of misclassified samples}{Total number of samples}

(20)

4.3.4. Precision, Recall, and F1 Score

Aside from authentication accuracy, precision, recall, and F1 score are used to evaluate the performance of ECA-ResNet-50, as shown in Equations (21), (22), and (23), respectively.

Precision = TP / (TP + FP)

(21)

Recall = TP / (TP + FN)

(22)

F 1 = 2 \times (Precision \times Recall) / (Precision + Recall)

(23)

4.3.5. Equal Error Rate (EER)

EER can be defined as the error rate at which the false acceptance rate (FAR) and the false rejection rate (FRR) are equal. Thus, an EER can be generally explained as, the smaller the EER, the greater the biometric system’s accuracy and verification performance.

5. Results

Experiments were carried out by splitting the dataset into 1920 samples for use as a training set and 480 samples for use as a test set. Initially, the attention-gated residual U-Net is trained until the network becomes stable, after which the weights are frozen to connect with the ECA-ResNet-50 feature extractor to train discriminative features for recognition and verification. The segmentation and authentication results are reported separately.

5.1. Palm Vein Segmentation

To analyze and evaluate the proposed segmentation network’s performance, the IoU coefficient and the dice similarity coefficient are calculated. Aside from the baseline U-Net model, residual blocks and an attention gate module are also separately used as different training configurations of the U-Net architecture to be evaluated. As shown in Figure 9, the segmentation result of the U-Net model (c) contains several incorrect predictions. In contrast, our proposed model (d) results in better segmentation performance while suppressing irrelevant areas. Table 1 displays the results of the comparison between the state-of-the-art methods and our proposed method for palm vein segmentation.

To address the significance of our segmentation model for palm vein recognition and verification, the ECA-ResNet model is additionally trained with the original grayscale ROI images without connecting the segmentation model. As can be seen in Table 2, both the recognition and verification results became lower than the proposed approach as the network surfer learned enough discriminative features from low-resolution and unclear palm vein ROI images.

5.2. Palm Vein Authentication

This section includes two experiments for palm vein recognition and verification. First, the palm vein recognition involves identifying a palm vein image using a SoftMax probability output within a fixed number of classes. Second, palm vein verification involves a one-to-one comparison between the query palm image and the existing template to verify whether they have the same identity.

For recognition, the binary head of the proposed ECA-ResNet model is used and SoftMax prediction probabilities can be obtained by applying the Sigmoid activation of logits to identify 1-to-N approach classification as it can predict 200 classes (identities). Each class represents one identity (or the one hand of one person). The input is the single palm ROI image resulting from the segmentation model, and the output is the result of the fixed-length vector with a size of 200 that includes the probability of each identity. As described in Table 3, our proposed network correctly predicts all 480 test samples, therefore, achieving high-performance accuracy compared to existing proposed methods.

For verification, an experiment using a 1-to-1 matching technique is performed. First, genuine and imposter pairs are created from 480 test samples that include 200 classes. In this experiment, the feature embedding layer right before the binary head and margin head of our proposed model (Figure 8) is used for comparison. The feature vector embeddings are normalized using the L2 norm before computing the distance between two vectors to decide whether the vein sample belongs to the template. Since the test set includes at least two sample images (or three for some classes) for each individual, the number of genuine matching scores becomes 360, and the number of imposter matching scores becomes 114,600. For both genuine and imposter pairs, the image is matched with another sample at least once but not more than once. The purpose of this is to ensure that no duplicate samples are paired repeatedly, which may cause an unbalance and biased verification score. To verify between the samples, the matching scores are computed using the cosine distance, where the scores will be closer to 0 for intra-class and closer to 1 for inter-class samples. The cosine similarity between two samples with an angle

θ

is defined as Equation (24). As presented in Table 4, an equal error rate of 0.018 is obtained, which is a relatively low error rate for palm vein verification.

\begin{matrix} cos (θ) = \frac{A \cdot B}{∥ A ∥ ∥ B ∥} = \frac{\sum_{i = 1}^{n} A_{i} \times B_{i}}{\sqrt{\sum_{i = 1}^{n} {(A_{i})}^{2}} \times \sqrt{\sum_{i = 1}^{n} {(B_{i})}^{2}}} \end{matrix}

(24)

Ablation Study

To study the importance of each loss function in the experiment, an ablation study examining the different loss functions was also conducted. As presented in Figure 10, the margin-based ArcFace loss function has a significantly larger improvement than triplet loss. We can see that the imposter distribution resulting from triplet loss is wider, and the overlap areas between the imposter/genuine scores are also larger. Moreover, the proposed combined loss can perform better, with the benefits of a higher recall rate and F1 score and better matching score distances between the separation of genuine and imposter samples, as shown in Figure 10f. The verification result obtained from each loss function is described in Table 5.

6. Conclusions

In this study, we proposed two subsequent methods for palm vein authentication. First, we proposed an attention-aware segmentation model and explained the vein labeling approach using the Hessian-based blood vessel filtering method. With the correct labeling of the palm vein patterns, this research has examined the effectiveness of integrating the attention mechanism and residual block within a single U-Net architecture, aiming to distinguish between salient and noisy features while enhancing critical features. Our proposed segmentation method has demonstrated superior performance compared to other baseline models. Second, the palm vein authentication model is proposed, which emphasizes the problem of feature embeddings between intra-class and inter-class palm vein verification. In particular, efficient channel attention is integrated with the ResNet-50 architecture with no additional computational cost, followed by a multi-task learning approach with the binary head and margin head, which further strengthens the model’s authentication capability. As a third significant contribution, we design an optimized loss function aiming to effectively learn discriminative features, thereby enhancing the overall authentication process. This proposed methodology achieves 100% accuracy for palm vein recognition and an equal error rate of 0.018 for palm vein verification. Moreover, the ablation study demonstrates that our proposed combined loss function enhances inter-class diversity and intra-class compactness while focusing on hard-to-classify palm vein samples, ultimately achieving more discriminative features for improved palm vein recognition.

Author Contributions

A.S.M.H. designed and developed the proposed method, conducted the experiments, and wrote the manuscript. H.J.L. designed the new concept, provided the conceptual idea and insightful suggestions to refine it further, and reviewed the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the project for Joint Demand Technology R&D of Regional SMEs funded by the Korean Ministry of SMEs and Startups in 2023 (Project No. RS-2023-00207672).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Unavailable due to further research.

Conflicts of Interest

The authors declare no conflict of interest.

References

Kumar, A.; Zhou, Y. Human identification using finger images. IEEE Trans. Image Process. 2011, 21, 2228–2244. [Google Scholar] [CrossRef] [PubMed]
Wildes, R.P. Iris recognition: An emerging biometric technology. Proc. IEEE 1997, 85, 1348–1363. [Google Scholar] [CrossRef]
Florian, S.; Kalenichenko, D.; Philbin, J. FaceNet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015. [Google Scholar]
Lu, G.; Zhang, D.; Wang, K. Palmprint recognition using eigenpalms features. Pattern Recognit. Lett. 2003, 24, 1463–1467. [Google Scholar] [CrossRef]
Wu, X.; Zhang, D.; Wang, K. Fisherpalms based palmprint recognition. Pattern Recognit. Lett. 2003, 24, 2829–2838. [Google Scholar]
Han, A.W.-Y.; Lee, J.-C. Palm vein recognition using adaptive Gabor filter. Expert Syst. Appl. 2012, 39, 13225–13234. [Google Scholar] [CrossRef]
Van, H.T.; Duong, C.M.; Van Vu, G.; Le, T.H. Palm vein recognition using enhanced symmetry local binary pattern and sift features. In Proceedings of the 2019 19th International Symposium on Communications and Information Technologies (ISCIT), Ho Chi Minh City, Vietnam, 25–27 September 2019; pp. 311–316. [Google Scholar]
Deng, J.; Guo, J.; Xue, N.; Zafeiriou, S. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
Wang, H.; Wang, Y.; Zhou, Z.; Ji, X.; Gong, D.; Zhou, J.; Li, Z.; Liu, W. Cosface: Large margin cosine loss for deep face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 5265–5274. [Google Scholar]
Zhang, Z.; Liu, Q.; Wang, Y. Road extraction by deep residual U-net. IEEE Geosci. Remote Sens. Lett. 2018, 15, 749–753. [Google Scholar] [CrossRef]
Wang, R.; Wang, G.; Chen, Z.; Zeng, Z.; Wang, Y. A palm vein identification system based on Gabor wavelet features. Neural Comput. Appl. 2014, 24, 161–168. [Google Scholar] [CrossRef]
Wu, K.-S.; Lee, J.-C.; Lo, T.-M.; Chang, K.-C.; Chang, C.-P. A secure palm vein recognition system. J. Syst. Softw. 2013, 86, 2870–2876. [Google Scholar] [CrossRef]
Lu, W.; Li, M.; Zhang, L. Palm vein recognition using directional features derived from local binary patterns. Int. J. Signal Process. Image Process. Pattern Recognit. 2013, 9, 87–98. [Google Scholar] [CrossRef]
Mirmohamadsadeghi, L.; Drygajlo, A. Palm vein recognition with local binary patterns and local derivative patterns. In Proceedings of the 2011 International Joint Conference on Biometrics (IJCB), Washington, DC, USA, 11–13 October 2011; pp. 1–6. [Google Scholar] [CrossRef]
Aglio-Caballero, A.; Rios-Sanchez, B.; Sanchez-Avila, C.; Giles, M.J.M.D. Analysis of local binary patterns and uniform local binary patterns for palm vein biometric recognition. In Proceedings of the 2017 International Carnahan Conference on Security Technology (ICCST), Madrid, Spain, 23–26 October 2017; pp. 1–6. [Google Scholar] [CrossRef]
Kang, W.; Wu, Q. Contactless palm vein recognition using a mutual foreground-based local binary pattern. IEEE Trans. Inf. Forensics Secur. 2014, 9, 1974–1985. [Google Scholar] [CrossRef]
Aberni, Y.; Boubchir, L.; Daachi, B. Palm vein recognition based on competitive coding scheme using multi-scale local binary pattern with ant colony optimization. Pattern Recognit. Lett. 2020, 136, 101–110. [Google Scholar] [CrossRef]
Fronitasari, D.; Gunawan, D. Palm vein recognition by using modified of local binary pattern (LBP) for extraction feature. In Proceedings of the 2017 15th International Conference on Quality in Research (QiR): International Symposium on Electrical and Computer Engineering, Nusa Dua, Bali, Indonesia, 24–27 July 2017; pp. 18–22. [Google Scholar] [CrossRef]
Mirmohamadsadeghi, L.; Drygajlo, A. Palm vein recognition with local texture patterns. IET Biom. 2014, 3, 198–206. [Google Scholar] [CrossRef]
Saxena, J.; Teckchandani, K.; Pandey, P.; Dutta, M.K.; Travieso, C.M.; Alonso-Hernández, J.B. Palm vein recognition using local tetra patterns. In Proceedings of the 2015 4th International Work Conference on Bioinspired Intelligence (IWOBI), San Sebastian, Spain, 10–12 June 2015; pp. 151–156. [Google Scholar]
Rahul, R.C.; Cherian, M.; Mohan, M.C.M. A novel MF-LDTP approach for contactless palm vein recognition. In Proceedings of the 2015 International Conference on Computing and Network Communications (CoCoNet), Trivandrum, India, 16–19 December 2015; pp. 793–798. [Google Scholar] [CrossRef]
Akbar, A.F.; Wirayudha, T.A.B.; Sulistiyo, M.D. Palm vein biometric identification system using local derivative pattern. In Proceedings of the 2016 4th International Conference on Information and Communication Technology (ICoICT), Bandung, Indonesia, 25–27 May 2016; pp. 1–6. [Google Scholar] [CrossRef]
Kasiselvanathan, M.; Sangeetha, V.; Kalaiselvi, A. Palm pattern recognition using scale invariant feature transform. Int. J. Intell. Sustain. Comput. 2020, 1, 44–52. [Google Scholar] [CrossRef]
Gurunathan, V.; Sathiyapriya, T.; Sudhakar, R. Multimodal biometric recognition system using SURF algorithm. In Proceedings of the 2016 10th International Conference on Intelligent Systems and Control (ISCO), Coimbatore, India, 7–8 January 2016; pp. 1–5. [Google Scholar] [CrossRef]
Kang, W.; Liu, Y.; Wu, Q.; Yue, X. Contact-free palm-vein recognition based on local invariant features. PLoS ONE 2014, 9, e97548. [Google Scholar] [CrossRef]
Kim, H.G.; Lee, E.J.; Yoon, G.J.; Yang, S.D.; Lee, E.C.; Sang, M.Y. Illumination normalization for sift based finger vein authentication. In Proceedings of the International Symposium on Visual Computing, Rethymnon, Greece, 16–18 July 2012; Springer: Berlin, Germany, 2012; pp. 21–30. [Google Scholar]
Xiuyuan, L.; Tiegen, L.; Shichao, D.; Jin, H.; Yun, W. Fast recognition of hand vein with surf descriptors. Chin. J. Sci. Instrum. 2011, 32, 831–836. [Google Scholar]
Perwira, D.Y.; Agung, B.W.T.; Sulistiyo, M.D. Personal palm vein identification using principal component analysis and probabilistic neural network. In Proceedings of the 2014 International Conference on Information Technology Systems and Innovation (ICITSI), Bandung, Indonesia, 24–27 November 2014; pp. 99–104. [Google Scholar] [CrossRef]
Bayoumi, S.; Al-Zahrani, S.; Sheikh, A.; Al-Sebayel, G.; Al-Magooshi, S.; Al-Sayigh, S. PCA-based palm vein authentication system. In Proceedings of the 2013 International Conference on Information Science and Applications (ICISA), Pattaya, Thailand, 24–26 June 2013; pp. 1–3. [Google Scholar] [CrossRef]
Ahmad, F.; Cheng, L.-M.; Khan, A. Lightweight and privacy- preserving template generation for palm-vein-based human recognition. IEEE Trans. Inf. Forensics Secur. 2020, 15, 184–194. [Google Scholar] [CrossRef]
Micheletto, M.; Orru, G.; Rida, I.; Ghiani, L.; Marcialis, G.L. A multiple classifiers-based approach to palmvein identification. In Proceedings of the 2018 Eighth International Conference on Image Processing Theory, Tools and Applications (IPTA), Xi’an, China, 7–10 November 2018; pp. 1–6. [Google Scholar]
Rizki, F.; Wirayuda, T.A.B.; Ramadhani, K.N. Identity recognition based on palm vein feature using two-dimensional linear discriminant analysis. In Proceedings of the 2016 1st International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE), Yogyakarta, Indonesia, 23–24 August 2016; pp. 21–25. [Google Scholar]
Elnasir, S.; Shamsuddin, S.M. Palm vein recognition based on 2D-discrete wavelet transform and linear discrimination analysis. Int. J. Adv. Soft Comput. Appl. 2014, 6, 43–59. [Google Scholar]
Elnasir, S.; Shamsuddin, S.M. Proposed scheme for palm vein recognition based on linear discrimination analysis and nearest neighbour classifier. In Proceedings of the 2014 International Symposium on Biometrics and Security Technologies (ISBAST), Kuala Lumpur, Malaysia, 26–27 August 2014; pp. 67–72. [Google Scholar]
Obayya, M.I.; El-Ghandour, M.; Alrowais, F. Contactless palm vein authentication using deep learning with bayesian optimization. IEEE Access 2021, 9, 1940–1957. [Google Scholar] [CrossRef]
Pan, Z.; Wang, J.; Wang, G.; Zhu, J. Multi-Scale Deep Representation Aggregation for Vein Recognition. IEEE Trans. Inf. Forensics Secur. 2021, 16, 1–15. [Google Scholar] [CrossRef]
Wei, W.; Wang, Q.; Yu, S.; Luo, Q.; Lin, S.; Han, Z.; Tang, Y. Outside Box and Contactless Palm Vein Recognition Based on a Wavelet Denoising ResNet. IEEE Access 2021, 9, 82471–82484. [Google Scholar]
Pan, Z.; Wang, J.; Shen, Z.; Chen, X.; Li, M. Multi-layer convolutional features concatenation with semantic feature selector for vein recognition. IEEE Access 2019, 7, 90608–90619. [Google Scholar] [CrossRef]
Yao, C.Y.; Jhong, S.Y.; Hsia, C.H.; Hua, K.L. Explainable AI: A Multispectral Palm-Vein Identification System with New Augmentation Features. ACM Trans. Multimed. Comput. Commun. Appl. 2021, 17, 1–21. [Google Scholar] [CrossRef]
Felix, M.; Abdulla, W.H. Segmentation of Palm Vein Images Using U-Net. In Proceedings of the 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Auckland, New Zealand, 7–10 December 2020; pp. 64–70. [Google Scholar]
Wang, P.; Qin, H. Palm-vein verification based on U-net. IOP Conf. Ser. Mater. Sci. Eng. 2020, 806, 012043. [Google Scholar] [CrossRef]
Thapar, D.; Jaswal, G.; Nigam, A.; Kanhangad, V. Pvsnet: Palm Vein Authentication Siamese Network Trained Using Triplet Loss and Adaptive Hard Mining by Learning Enforced Domain Specific Features; Institute of Electrical and Electronics Engineers Inc.: Hyderabad, India, 2019. [Google Scholar] [CrossRef]
Tim, J.; Pernus, F.; Likar, B.; Špiclin, Ž. Beyond Frangi: An improved multiscale vesselness filter. In Medical Imaging 2015: Image Processing, Proceedings of the SPIE Medical Imaging, Orlando, FL, USA, 21–26 February 2015; Ourselin, S., Styner, M.A., Eds.; SPIE: Bellingham, WA, USA, 2015; Volume 9413, pp. 623–633. [Google Scholar] [CrossRef]
Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.Y.; Kainz, B.; et al. Attention U-Net: Learning where to look for the pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
Olaf, R.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention; Springer: Cham, Switzerland, 2015. [Google Scholar]
Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2011–2023. [Google Scholar] [CrossRef] [PubMed]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
Elad, H.; Ailon, N. Deep metric learning using triplet network. In International Workshop on Similarity-Based Pattern Recognition; Springer: Cham, Switzerland, 2015. [Google Scholar]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
CASIA-MS-PalmprintV1. Available online: http://biometrics.idealtest.org/ (accessed on 1 May 2023).
Meraoumia, A.; Kadri, F.; Bendjenna, H.; Chitroub, S.; Bouridane, A. Improving biometric identification performance using PCANet deep learning and multispectral palmprint. In Signal Processing for Security Technologies; Springer: Cham, Switzerland, 2017; pp. 51–69. [Google Scholar]
Htet, A.S.M.; Lee, H.J. TripletGAN VeinNet: Palm Vein Recognition Based on Generative Adversarial Network and Triplet Loss. In Proceedings of the 2021 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI), Shanghai, China, 27–29 August 2021. [Google Scholar]
Ananthi, G.; Sekar, J.R.; Arivazhagan, S. Ensembling Scale Invariant and Multiresolution Gabor Scores for Palm Vein Identification. Inf. Technol. Control 2022, 51, 704–722. [Google Scholar] [CrossRef]
Chen, Y.-Y.; Hsia, C.; Chen, P. Contactless multispectral palm-vein recognition with lightweight convolutional neural network. IEEE Access 2021, 9, 149796–149806. [Google Scholar] [CrossRef]

Figure 1. Overall proposed method for palm vein authentication.

Figure 2. Radial Distance Function with Local Maxima and Minima.

Figure 3. Process of palm vein ROI extraction: (a) Image from CASIA dataset; (b) Binarization; (c) Detected hand contours, maxima and minima points, and centroid C; (d)

P 1

and

P 2

are selected, and

θ

is calculated for rotation; (e) The palm vein image is rotated, and additional points,

P 3

,

P 4

,

E 1

, and

E 2

are obtained; and (f) ROI image extracted using

E 1

and

E 2

.

Figure 3. Process of palm vein ROI extraction: (a) Image from CASIA dataset; (b) Binarization; (c) Detected hand contours, maxima and minima points, and centroid C; (d)

P 1

and

P 2

are selected, and

θ

is calculated for rotation; (e) The palm vein image is rotated, and additional points,

P 3

,

P 4

,

E 1

, and

E 2

are obtained; and (f) ROI image extracted using

E 1

and

E 2

.

Figure 4. (a) Original ROI; (b) ROI image filtered by Jerman method; and (c) final labeled ROI image for segmentation.

Figure 5. Schematic of attention gate.

Figure 6. Detailed network architecture of proposed attention-gated residual U-Net segmentation model.

Figure 7. Efficient channel attention (ECA) module.

Figure 8. Detailed network architecture of proposed ECA-ResNet-50 recognition model.

Figure 9. Palm vein segmentation results: (a) Histogram equalized original palm images; (b) Ground-truth label images; (c) Segmentation results from original U-Net model; and (d) Segmentation results from proposed attention-gated residual U-Net model.

Figure 10. Genuine/imposter score plots on palm vein verification obtained using different loss functions: (a) Triplet loss; (b) Focal loss; (c) ArcFace loss; (d) ArcFace + triplet loss; (e) ArcFace + focal loss; and (f) proposed ArcFace + triplet + focal loss.

Table 1. Comparison with other state-of-the-art models for palm vein segmentation.

Method	IoU Coef.	Dice Coef.
U-Net	95.59	97.75
Attention U-Net	95.83	97.87
Residual U-Net	96.16	98.04
Proposed method	96.24	98.09

Table 2. Result of palm vein recognition with/without segmentation model.

Method	Accuracy	ERR
Without vein segmentation	98.375	0.547
Proposed method	100	0.018

Table 3. Comparison of the proposed method with other existing studies using identification accuracy.

Method	Year	Accuracy
PCANet with deep learning [52]	2017	96.50
PVSNet [42]	2018	85.16
Hong et al. [7]	2019	96.33
CNN + Bayesian optimization [35]	2020	99.40
TripletGAN VeinNet [53]	2021	97
Explainable palm vein recognition [39]	2021	100
Ensembling scale invariant and multiresolution Gabor scores [54]	2022	99.73
Proposed Method	2023	100

Table 4. ERRs of different methods for palm vein verification.

Method	Year	ERR
PCANet with deep learning [52]	2017	0.949
PVSNet [42]	2018	3.170
Wang et al. [41]	2019	0.470
CNN + Bayesian optimization [35]	2020	0.068
Chen et al. [55]	2021	0.056
Ensembling scale invariant and multiresolution Gabor scores [54]	2022	0.026
Proposed Method	2023	0.018

Table 5. Results from the comparison obtained by training proposed the ECA-Resnet-50 on different loss functions.

Loss Func.	Train Acc.	Test Acc.	Recall	Precision	F1	ERR
Triplet loss	100	98.95	91.66	95.37	93.48	0.096
Focal loss	98.44	96.25	83.33	92.59	87.72	0.400
ArcFace loss	100	99.58	96.66	99.71	98.16	0.065
ArcFace + triplet loss	100	99.79	97.50	99.71	98.59	0.040
ArcFace + focal loss	100	99.58	96.94	99.71	98.31	0.052
ArcFace + triplet + focal loss (proposed)	100	100	98.61	100	99.30	0.018

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Htet, A.S.M.; Lee, H.J. Contactless Palm Vein Recognition Based on Attention-Gated Residual U-Net and ECA-ResNet. Appl. Sci. 2023, 13, 6363. https://doi.org/10.3390/app13116363

AMA Style

Htet ASM, Lee HJ. Contactless Palm Vein Recognition Based on Attention-Gated Residual U-Net and ECA-ResNet. Applied Sciences. 2023; 13(11):6363. https://doi.org/10.3390/app13116363

Chicago/Turabian Style

Htet, Aung Si Min, and Hyo Jong Lee. 2023. "Contactless Palm Vein Recognition Based on Attention-Gated Residual U-Net and ECA-ResNet" Applied Sciences 13, no. 11: 6363. https://doi.org/10.3390/app13116363

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Contactless Palm Vein Recognition Based on Attention-Gated Residual U-Net and ECA-ResNet

Abstract

1. Introduction

2. Related Work

2.1. Handcrafted Methods

2.2. Deep Learning Methods

3. Attention-Gated Residual U-Net and ECA-ResNet

3.1. Region of Interest Extraction

3.2. Palm Vein Segmentation

3.2.1. Ground-Truth Labeling

3.2.2. Attention Mechanism

3.2.3. Residual Units

3.2.4. Proposed Attention-Gated Residual U-Net

3.2.5. Loss Function

3.3. Palm Vein Recognition

3.3.1. Efficient Channel Attention

3.3.2. Proposed ECA-ResNet

3.3.3. Loss Function

4. Experiments

4.1. Dataset

4.2. Experimental Setup

4.3. Evaluation Metrics

4.3.1. IoU Coefficient

4.3.2. Dice Similarity Coefficient

4.3.3. Identification Accuracy

4.3.4. Precision, Recall, and F1 Score

4.3.5. Equal Error Rate (EER)

5. Results

5.1. Palm Vein Segmentation

5.2. Palm Vein Authentication

Ablation Study

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI