Advances in Deep Learning III

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (31 October 2023) | Viewed by 24877

Special Issue Editors


E-Mail Website
Guest Editor
Department of Industrial and Management Engineering, Sungkyul University, Seoul, Korea
Interests: deep learning; unstructured data analysis; human factors; explainable AI
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Industrial and Systems Engineering, Dongguk University, Seoul, Korea
Interests: deep learning; meta learning; health analytics; gerontechnology; convergence system

E-Mail Website
Guest Editor
Department of Industrial and Systems Engineering, Dongguk University, Seoul, Korea
Interests: production scheduling; vehicle routing; explainable AI; nature-inspired optimization; machine learning applications

Special Issue Information

Dear Colleagues,

Deep learning is one of the research topics that attract a lot of attention from researchers in academia and industry. Compared to traditional machine learning methods, deep learning algorithms demonstrate their ability to train models from large-volume data sets. Also, those algorithms have significantly surpassed the performance of traditional methodologies for computer vision, natural language processing, robotics, and other fields. In recent years, a variety of theories and algorithms have advanced significantly in the field of artificial intelligence, including neural network structure, optimization, data representation, and deep reinforcement learning. Deep learning models such as attention mechanisms, and hostile generative networks, and GPT-3 (Generative Pre-trained Transformer 3) have also been developed and have achieved incredible achievements and successes. The proposal for this Special Issue is about the latest theoretical and practical applications of deep learning. In this regard, models derived from previous studies can be improved through research on the application of deep learning techniques to data analysis in different domains of research or industry.

The purpose of our Special Issue is to contribute to the demonstration of innovative algorithms and application areas of deep learning to solve problems in various research domains. Eventually, we aim to promote research and development of deep learning, by publishing high-quality original research articles, reviews, theoretical and critical perspectives, and viewpoint articles on the following topics:

  • Deep Learning Algorithm/Architectures/Theory
  • Deep Reinforcement Learning
  • Adversarial Examples
  • Deep Generative Models
  • Multitask, Transfer, and Meta Learning
  • Bayesian Methods
  • Explainable and Interpretable AI
  • Representation Learning
  • Constrained Optimization
  • Computer Vision
  • Natural Language Processing (NLP)
  • Sentiment Analysis

Prof. Dr. Wonjoon Kim
Prof. Dr. Sekyoung Youm
Prof. Dr. Sungbum Jun
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Deep Learning Algorithm/Architectures/Theory
  • Deep Reinforcement Learning
  • Adversarial Examples
  • Deep Generative Models
  • Multitask, Transfer, and Meta Learning
  • Bayesian Methods
  • Explainable and Interpretable AI
  • Representation Learning
  • Constrained Optimization
  • Computer Vision
  • Natural Language Processing (NLP)
  • Sentiment Analysis

Published Papers (13 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

19 pages, 1617 KiB  
Article
Amortized Bayesian Meta-Learning with Accelerated Gradient Descent Steps
by Zhewei Zhang, Xuejing Li and Shengjin Wang
Appl. Sci. 2023, 13(15), 8653; https://doi.org/10.3390/app13158653 - 27 Jul 2023
Viewed by 702
Abstract
Recent meta-learning models often learn priors from observed tasks using a network optimized via stochastic gradient descent (SGD), which usually takes more training steps to convergence. In this paper, we propose an accelerated Bayesian meta-learning structure with a stochastic inference network (ABML-SIN). The [...] Read more.
Recent meta-learning models often learn priors from observed tasks using a network optimized via stochastic gradient descent (SGD), which usually takes more training steps to convergence. In this paper, we propose an accelerated Bayesian meta-learning structure with a stochastic inference network (ABML-SIN). The proposed model aims to solve the training procedure of Bayesian meta-learning to improve the training speed and efficiency. Current approaches of meta-learning hardly converge within a few descent steps, owing to the small number of training samples. Therefore, we introduce an accelerated gradient descent learning network based on teacher–student architecture to learn the meta-latent variable θt for task t. With this amortized fast inference network, the meta-learner is able to learn the task-specific latent θt within a few training steps; thus, it improves the learning speed of the meta-learner. To refine the latent variables generated from the transductive amortization network of the meta-learner, SIN—followed by a conventional SGD-optimized network—is introduced as the student–teacher network to online-update the parameters. SIN extracts the local latent variables and accelerates the convergence of the meta-learning network. Our experiments on simulation data demonstrate that the proposed method provides generalization and scalability on unseen samples, and produces competitive/superior uncertainty estimations on few-shot learning tasks on two widely adopted 2D datasets with fewer training epochs compared to the state-of-the-art meta-learning approaches. Furthermore, the parameters generated by SIN act as perturbations on latent weights, enhancing the probability of accelerating the training efficiency of the meta-learner. Extensive qualitative experiments show that our method performs well across different meta-learning tasks in both simulated and real-world circumstances. Full article
(This article belongs to the Special Issue Advances in Deep Learning III)
Show Figures

Figure 1

11 pages, 349 KiB  
Article
Discovering and Ranking Relevant Comment for Chinese Automatic Question-Answering System
by Siyuan Cheng, Didi Yin, Zhuoyan Hou, Zihao Shi, Dongyu Wang and Qiang Fu
Appl. Sci. 2023, 13(4), 2716; https://doi.org/10.3390/app13042716 - 20 Feb 2023
Viewed by 991
Abstract
Intelligent customer service system is timely, efficient, and accurate, which is more and more popular in grid electric power companies, and the amount of customer consultation is increasing day by day. It is infeasible for human customer service to answer these questions on [...] Read more.
Intelligent customer service system is timely, efficient, and accurate, which is more and more popular in grid electric power companies, and the amount of customer consultation is increasing day by day. It is infeasible for human customer service to answer these questions on time, so an automatic question-answering system is of great help to the grid electric power company. The customer queries from the grid electric power company customer service is very different from open-domain questions: the problems questioned by customer tend to be for a specific device or system within the enterprise operation problem. Most grid electric companies provide customers with a communication platform where customers can get guidance on using equipment and the business process. The comments from communication platforms are valuable resources for answering customer questions. In our work, we use three neural network models which excavate potential answers to customer queries from comments. One of the key challenges, however, is the difficulty of matching customer questions with comments. To solve this problem, we propose a method based on deep learning to find the comments related to customer questions to generate more accurate and reliable answers. Experiments can prove that our method performed well in the customer service of grid electric power company. Full article
(This article belongs to the Special Issue Advances in Deep Learning III)
Show Figures

Figure 1

23 pages, 7318 KiB  
Article
Bilateral Attention U-Net with Dissimilarity Attention Gate for Change Detection on Remote Sensing Imageries
by Jongseok Lee, Wahyu Wiratama, Wooju Lee, Ismail Marzuki and Donggyu Sim
Appl. Sci. 2023, 13(4), 2485; https://doi.org/10.3390/app13042485 - 15 Feb 2023
Cited by 1 | Viewed by 1044
Abstract
This study proposes a bilateral attention U-Net with a dissimilarity attention gate (DAG) for change detection on remote sensing imageries. The proposed network is designed with a bilateral dissimilarity encoding for the DAG calculation to handle reversible input images, resulting in high detection [...] Read more.
This study proposes a bilateral attention U-Net with a dissimilarity attention gate (DAG) for change detection on remote sensing imageries. The proposed network is designed with a bilateral dissimilarity encoding for the DAG calculation to handle reversible input images, resulting in high detection rates regardless of the order of the two input images for change detection. The DAG exploits all the combinations of joint features to avoid spectral information loss fed into an attention gate on the decoder side. The effectiveness of the proposed method was evaluated on the KOMPSAT-3 satellite images dataset and the aerial change detection dataset (CDD). Its performance was better than that of conventional methods (specifically, U-Net, ATTUNet, and Modified-UNet++) as it achieved average F1-score and kappa coefficient (KC) values of 0.68 and 66.93, respectively, for the KOMPSAT-3 dataset. For CDD, it achieved F1-score and KC values of 0.70 and 68.74, respectively, which are also better values than those achieved by conventional methods. In addition, we found that the proposed bilateral attention U-Net can provide the same changed map regardless of whether the image order is reversed. Full article
(This article belongs to the Special Issue Advances in Deep Learning III)
Show Figures

Figure 1

17 pages, 1722 KiB  
Article
Combining Human Parsing with Analytical Feature Extraction and Ranking Schemes for High-Generalization Person Reidentification
by Nikita Gabdullin
Appl. Sci. 2023, 13(3), 1289; https://doi.org/10.3390/app13031289 - 18 Jan 2023
Viewed by 1318
Abstract
Person reidentification (re-ID) has been receiving increasing attention in recent years due to its importance for both science and society. Machine learning (particularly Deep Learning (DL)) has become the main re-ID tool that has allowed to achieve unprecedented accuracy levels on benchmark datasets. [...] Read more.
Person reidentification (re-ID) has been receiving increasing attention in recent years due to its importance for both science and society. Machine learning (particularly Deep Learning (DL)) has become the main re-ID tool that has allowed to achieve unprecedented accuracy levels on benchmark datasets. However, there is a known problem of poor generalization in respect of DL models. That is, models that are trained to achieve high accuracy on one dataset perform poorly on other ones and require re-training. In order to address this issue, we present a model without trainable parameters. This, in turn, results in a great potential for high generalization. This approach combines a fully analytical feature extraction and similarity ranking scheme with DL-based human parsing wherein human parsing is used to obtain the initial subregion classification. We show that such combination, to a high extent, eliminates the drawbacks of existing analytical methods. In addition, we use interpretable color and texture features that have human-readable similarity measures associated with them. In order to verify the proposed method we conduct experiments on Market1501 and CUHK03 datasets, thus achieving a competitive rank-1 accuracy comparable with that of DL models. Most importantly, we show that our method achieves 63.9% and 93.5% rank-1 cross-domain accuracy when applied to transfer learning tasks, while also being completely re-ID dataset agnostic. We also achieve a cross-domain mean average precision (mAP) that is higher than that of DL models in some experiments. Finally, we discuss the potential ways of adding new features to further improve the model. We also show the advantages of interpretable features for the purposes of constructing human-generated queries from verbal descriptions in order to conduct searches without a query image. Full article
(This article belongs to the Special Issue Advances in Deep Learning III)
Show Figures

Figure 1

15 pages, 3112 KiB  
Article
NGIoU Loss: Generalized Intersection over Union Loss Based on a New Bounding Box Regression
by Chenghao Tong, Xinhao Yang, Qing Huang and Feiyang Qian
Appl. Sci. 2022, 12(24), 12785; https://doi.org/10.3390/app122412785 - 13 Dec 2022
Cited by 4 | Viewed by 1851
Abstract
Loss functions, such as the IoU Loss function and the GIoU (Generalized Intersection over Union) Loss function have been put forward to replace regression loss functions commonly used in regression loss calculation. GIoU Loss alleviates the vanishing gradient in the case of the [...] Read more.
Loss functions, such as the IoU Loss function and the GIoU (Generalized Intersection over Union) Loss function have been put forward to replace regression loss functions commonly used in regression loss calculation. GIoU Loss alleviates the vanishing gradient in the case of the non-overlapping, but it will completely degenerate into the IoU Loss function when bounding boxes overlap totally, which fails to achieve the optimization effect. To solve this problem, some improvements are proposed in this paper on the basis of the GIoU Loss function, taking into account the overlap rate of complete overlap of bounding boxes. In PASCAL VOC data, the experimental results demonstrate that the AP of NGIoU Loss function in the YOLOv4 model is 47.68%, 1.15% higher than that of the GIoU Loss function, and the highest map value is 86.79% in the YOLOv5 model. Full article
(This article belongs to the Special Issue Advances in Deep Learning III)
Show Figures

Figure 1

16 pages, 19672 KiB  
Article
FGCM: Noisy Label Learning via Fine-Grained Confidence Modeling
by Shaotian Yan, Xiang Tian, Rongxin Jiang and Yaowu Chen
Appl. Sci. 2022, 12(22), 11406; https://doi.org/10.3390/app122211406 - 10 Nov 2022
Viewed by 1366
Abstract
A small portion of mislabeled data can easily limit the performance of deep neural networks (DNNs) due to their high capacity for memorizing random labels. Thus, robust learning from noisy labels has become a key challenge for deep learning due to inadequate datasets [...] Read more.
A small portion of mislabeled data can easily limit the performance of deep neural networks (DNNs) due to their high capacity for memorizing random labels. Thus, robust learning from noisy labels has become a key challenge for deep learning due to inadequate datasets with high-quality annotations. Most existing methods involve training models on clean sets by dividing clean samples from noisy ones, resulting in large amounts of mislabeled data being unused. To address this problem, we propose categorizing training samples into five fine-grained clusters based on the difficulty experienced by DNN models when learning them and label correctness. A novel fine-grained confidence modeling (FGCM) framework is proposed to cluster samples into these five categories; with each cluster, FGCM decides whether to accept the cluster data as they are, accept them with label correction, or accept them as unlabeled data. By applying different strategies to the fine-grained clusters, FGCM can better exploit training data than previous methods. Extensive experiments on widely used benchmarks CIFAR-10, CIFAR-100, clothing1M, and WebVision with different ratios and types of label noise demonstrate the superiority of our FGCM. Full article
(This article belongs to the Special Issue Advances in Deep Learning III)
Show Figures

Figure 1

17 pages, 2124 KiB  
Article
CMD-Net: Self-Supervised Category-Level 3D Shape Denoising through Canonicalization
by Caner Sahin
Appl. Sci. 2022, 12(20), 10474; https://doi.org/10.3390/app122010474 - 17 Oct 2022
Cited by 2 | Viewed by 1369
Abstract
Point clouds provide a compact representation of 3D shapes however, the imperfections in acquisition processes corrupt point clouds by noise and give rise to a decrease in their power for representing 3D shapes. Learning-based denoising methods operate displacement prediction and suffer from shrinkage [...] Read more.
Point clouds provide a compact representation of 3D shapes however, the imperfections in acquisition processes corrupt point clouds by noise and give rise to a decrease in their power for representing 3D shapes. Learning-based denoising methods operate displacement prediction and suffer from shrinkage and outliers. In addition, they require pre-aligned datasets. In this paper, we present a self-supervised learning-based method, Canonical Mapping and Denoising Network (CMD-Net), and address category-level 3D shape denoising through canonicalization. We formulate denoising as a 3D semantic shape correspondence estimation task where we explore ordered 3D intrinsic structure points. Utilizing the convex hull of the explored structure points, the corruption on objects’ surfaces is eliminated. Our method is capable of canonicalizing noise-corrupted clouds under arbitrary rotations, therefore circumventing the requirement on pre-aligned data. The complete model learns to canonicalize the input through a novel transformer that serves as a proxy in the downstream denoising task. The analyses on the experiments validate the promising performance of the presented method on both synthetic and real data. We show that our method can not only eliminate corruption, but also remove clutter from the test data. We additionally create a novel dataset for the problem in hand and will make it publicly available in our project web-page. Full article
(This article belongs to the Special Issue Advances in Deep Learning III)
Show Figures

Figure 1

26 pages, 7222 KiB  
Article
Research on Improved Deep Convolutional Generative Adversarial Networks for Insufficient Samples of Gas Turbine Rotor System Fault Diagnosis
by Shucong Liu, Hongjun Wang and Xiang Zhang
Appl. Sci. 2022, 12(7), 3606; https://doi.org/10.3390/app12073606 - 01 Apr 2022
Cited by 2 | Viewed by 1817
Abstract
In gas turbine rotor systems, an intelligent data-driven fault diagnosis method is an important means to monitor the health status of the gas turbine, and it is necessary to obtain sufficient fault data to train the intelligent diagnosis model. In the actual operation [...] Read more.
In gas turbine rotor systems, an intelligent data-driven fault diagnosis method is an important means to monitor the health status of the gas turbine, and it is necessary to obtain sufficient fault data to train the intelligent diagnosis model. In the actual operation of a gas turbine, the collected gas turbine fault data are limited, and the small and imbalanced fault samples seriously affect the accuracy of the fault diagnosis method. Focusing on the imbalance of gas turbine fault data, an Improved Deep Convolutional Generative Adversarial Network (Improved DCGAN) suitable for gas turbine signals is proposed here, and a structural optimization of the generator and a gradient penalty improvement in the loss function are introduced to generate effective fault data and improve the classification accuracy. The experimental results of the gas turbine test bench demonstrate that the proposed method can generate effective fault samples as a supplementary set of fault samples to balance the dataset, effectively improve the fault classification and diagnosis performance of gas turbine rotors in the case of small samples, and provide an effective method for gas turbine fault diagnosis. Full article
(This article belongs to the Special Issue Advances in Deep Learning III)
Show Figures

Figure 1

11 pages, 2106 KiB  
Article
Late Fusion-Based Video Transformer for Facial Micro-Expression Recognition
by Jiuk Hong, Chaehyeon Lee and Heechul Jung
Appl. Sci. 2022, 12(3), 1169; https://doi.org/10.3390/app12031169 - 23 Jan 2022
Cited by 8 | Viewed by 2438
Abstract
In this article, we propose a novel model for facial micro-expression (FME) recognition. The proposed model basically comprises a transformer, which is recently used for computer vision and has never been used for FME recognition. A transformer requires a huge amount of data [...] Read more.
In this article, we propose a novel model for facial micro-expression (FME) recognition. The proposed model basically comprises a transformer, which is recently used for computer vision and has never been used for FME recognition. A transformer requires a huge amount of data compared to a convolution neural network. Then, we use motion features, such as optical flow and late fusion to complement the lack of FME dataset. The proposed method was verified and evaluated using the SMIC and CASME II datasets. Our approach achieved state-of-the-art (SOTA) performance of 0.7447 and 73.17% in SMIC in terms of unweighted F1 score (UF1) and accuracy (Acc.), respectively, which are 0.31 and 1.8% higher than previous SOTA. Furthermore, UF1 of 0.7106 and Acc. of 70.68% were shown in the CASME II experiment, which are comparable with SOTA. Full article
(This article belongs to the Special Issue Advances in Deep Learning III)
Show Figures

Figure 1

16 pages, 5840 KiB  
Article
Integrating Image Quality Enhancement Methods and Deep Learning Techniques for Remote Sensing Scene Classification
by Sheng-Chieh Hung, Hui-Ching Wu and Ming-Hseng Tseng
Appl. Sci. 2021, 11(24), 11659; https://doi.org/10.3390/app112411659 - 08 Dec 2021
Cited by 6 | Viewed by 2753
Abstract
Through the continued development of technology, applying deep learning to remote sensing scene classification tasks is quite mature. The keys to effective deep learning model training are model architecture, training strategies, and image quality. From previous studies of the author using explainable artificial [...] Read more.
Through the continued development of technology, applying deep learning to remote sensing scene classification tasks is quite mature. The keys to effective deep learning model training are model architecture, training strategies, and image quality. From previous studies of the author using explainable artificial intelligence (XAI), image cases that have been incorrectly classified can be improved when the model has adequate capacity to correct the classification after manual image quality correction; however, the manual image quality correction process takes a significant amount of time. Therefore, this research integrates technologies such as noise reduction, sharpening, partial color area equalization, and color channel adjustment to evaluate a set of automated strategies for enhancing image quality. These methods can enhance details, light and shadow, color, and other image features, which are beneficial for extracting image features from the deep learning model to further improve the classification efficiency. In this study, we demonstrate that the proposed image quality enhancement strategy and deep learning techniques can effectively improve the scene classification performance of remote sensing images and outperform previous state-of-the-art approaches. Full article
(This article belongs to the Special Issue Advances in Deep Learning III)
Show Figures

Figure 1

13 pages, 17559 KiB  
Article
A Deep Learning Architecture for 3D Mapping Urban Landscapes
by Armando Levid Rodríguez-Santiago, José Aníbal Arias-Aguilar, Hiroshi Takemura and Alberto Elías Petrilli-Barceló
Appl. Sci. 2021, 11(23), 11551; https://doi.org/10.3390/app112311551 - 06 Dec 2021
Viewed by 1958
Abstract
In this paper, an approach through a Deep Learning architecture for the three-dimensional reconstruction of outdoor environments in challenging terrain conditions is presented. The architecture proposed is configured as an Autoencoder. However, instead of the typical convolutional layers, some differences are proposed. The [...] Read more.
In this paper, an approach through a Deep Learning architecture for the three-dimensional reconstruction of outdoor environments in challenging terrain conditions is presented. The architecture proposed is configured as an Autoencoder. However, instead of the typical convolutional layers, some differences are proposed. The Encoder stage is set as a residual net with four residual blocks, which have been provided with the necessary knowledge to extract the feature maps from aerial images of outdoor environments. On the other hand, the Decoder stage is set as a Generative Adversarial Network (GAN) and called a GAN-Decoder. The proposed network architecture uses a sequence of the 2D aerial image as input. The Encoder stage works for the extraction of the vector of features that describe the input image, while the GAN-Decoder generates a point cloud based on the information obtained in the previous stage. By supplying a sequence of frames that a percentage of overlap between them, it is possible to determine the spatial location of each generated point. The experiments show that with this proposal it is possible to perform a 3D representation of an area flown over by a drone using the point cloud generated with a deep architecture that has a sequence of aerial 2D images as input. In comparison with other works, our proposed system is capable of performing three-dimensional reconstructions in challenging urban landscapes. Compared with the results obtained using commercial software, our proposal was able to generate reconstructions in less processing time, with less overlapping percentage between 2D images and is invariant to the type of flight path. Full article
(This article belongs to the Special Issue Advances in Deep Learning III)
Show Figures

Figure 1

9 pages, 444 KiB  
Article
P-Norm Attention Deep CORAL: Extending Correlation Alignment Using Attention and the P-Norm Loss Function
by Zhi-Yong Wang and Dae-Ki Kang
Appl. Sci. 2021, 11(11), 5267; https://doi.org/10.3390/app11115267 - 06 Jun 2021
Cited by 2 | Viewed by 2013
Abstract
CORrelation ALignment (CORAL) is an unsupervised domain adaptation method that uses a linear transformation to align the covariances of source and target domains. Deep CORAL extends CORAL with a nonlinear transformation using a deep neural network and adds CORAL loss as a part [...] Read more.
CORrelation ALignment (CORAL) is an unsupervised domain adaptation method that uses a linear transformation to align the covariances of source and target domains. Deep CORAL extends CORAL with a nonlinear transformation using a deep neural network and adds CORAL loss as a part of the total loss to align the covariances of source and target domains. However, there are still two problems to be solved in Deep CORAL: features extracted from AlexNet are not always a good representation of the original data, as well as joint training combined with both the classification and CORAL loss may not be efficient enough to align the distribution of the source and target domain. In this paper, we proposed two strategies: attention to improve the quality of feature maps and the p-norm loss function to align the distribution of the source and target features, further reducing the offset caused by the classification loss function. Experiments on the Office-31 dataset indicate that our proposed methodologies improved Deep CORAL in terms of performance. Full article
(This article belongs to the Special Issue Advances in Deep Learning III)
Show Figures

Figure 1

22 pages, 5188 KiB  
Article
A Methodology for Utilizing Vector Space to Improve the Performance of a Dog Face Identification Model
by Bohan Yoon, Hyeonji So and Jongtae Rhee
Appl. Sci. 2021, 11(5), 2074; https://doi.org/10.3390/app11052074 - 26 Feb 2021
Cited by 4 | Viewed by 2573
Abstract
Recent improvements in the performance of the human face recognition model have led to the development of relevant products and services. However, research in the similar field of animal face identification has remained relatively limited due to the greater diversity and complexity in [...] Read more.
Recent improvements in the performance of the human face recognition model have led to the development of relevant products and services. However, research in the similar field of animal face identification has remained relatively limited due to the greater diversity and complexity in shape and the lack of relevant data for animal faces such as dogs. In the face identification model using triplet loss, the length of the embedding vector is normalized by adding an L2-normalization (L2-norm) layer for using cosine-similarity-based learning. As a result, object identification depends only on the angle, and the distribution of the embedding vector is limited to the surface of a sphere with a radius of 1. This study proposes training the model from which the L2-norm layer is removed by using the triplet loss to utilize a wide vector space beyond the surface of a sphere with a radius of 1, for which a novel loss function and its two-stage learning method. The proposed method classifies the embedding vector within a space rather than on the surface, and the model’s performance is also increased. The accuracy, one-shot identification performance, and distribution of the embedding vectors are compared between the existing learning method and the proposed learning method for verification. The verification was conducted using an open-set. The resulting accuracy of 97.33% for the proposed learning method is approximately 4% greater than that of the existing learning method. Full article
(This article belongs to the Special Issue Advances in Deep Learning III)
Show Figures

Figure 1

Back to TopTop