Research

19 pages, 1617 KiB

Open AccessArticle

Amortized Bayesian Meta-Learning with Accelerated Gradient Descent Steps

by Zhewei Zhang, Xuejing Li and Shengjin Wang

Appl. Sci. 2023, 13(15), 8653; https://doi.org/10.3390/app13158653 - 27 Jul 2023

Viewed by 702

Recent meta-learning models often learn priors from observed tasks using a network optimized via stochastic gradient descent (SGD), which usually takes more training steps to convergence. In this paper, we propose an accelerated Bayesian meta-learning structure with a stochastic inference network (ABML-SIN). The [...] Read more.

Recent meta-learning models often learn priors from observed tasks using a network optimized via stochastic gradient descent (SGD), which usually takes more training steps to convergence. In this paper, we propose an accelerated Bayesian meta-learning structure with a stochastic inference network (ABML-SIN). The proposed model aims to solve the training procedure of Bayesian meta-learning to improve the training speed and efficiency. Current approaches of meta-learning hardly converge within a few descent steps, owing to the small number of training samples. Therefore, we introduce an accelerated gradient descent learning network based on teacher–student architecture to learn the meta-latent variable

θ_{t}

for task t. With this amortized fast inference network, the meta-learner is able to learn the task-specific latent

θ_{t}

within a few training steps; thus, it improves the learning speed of the meta-learner. To refine the latent variables generated from the transductive amortization network of the meta-learner, SIN—followed by a conventional SGD-optimized network—is introduced as the student–teacher network to online-update the parameters. SIN extracts the local latent variables and accelerates the convergence of the meta-learning network. Our experiments on simulation data demonstrate that the proposed method provides generalization and scalability on unseen samples, and produces competitive/superior uncertainty estimations on few-shot learning tasks on two widely adopted 2D datasets with fewer training epochs compared to the state-of-the-art meta-learning approaches. Furthermore, the parameters generated by SIN act as perturbations on latent weights, enhancing the probability of accelerating the training efficiency of the meta-learner. Extensive qualitative experiments show that our method performs well across different meta-learning tasks in both simulated and real-world circumstances. Full article

(This article belongs to the Special Issue Advances in Deep Learning III)

► Show Figures

Figure 1

11 pages, 349 KiB

Open AccessFeature PaperArticle

Discovering and Ranking Relevant Comment for Chinese Automatic Question-Answering System

by Siyuan Cheng, Didi Yin, Zhuoyan Hou, Zihao Shi, Dongyu Wang and Qiang Fu

Appl. Sci. 2023, 13(4), 2716; https://doi.org/10.3390/app13042716 - 20 Feb 2023

Viewed by 991

Abstract

Intelligent customer service system is timely, efficient, and accurate, which is more and more popular in grid electric power companies, and the amount of customer consultation is increasing day by day. It is infeasible for human customer service to answer these questions on [...] Read more.

Intelligent customer service system is timely, efficient, and accurate, which is more and more popular in grid electric power companies, and the amount of customer consultation is increasing day by day. It is infeasible for human customer service to answer these questions on time, so an automatic question-answering system is of great help to the grid electric power company. The customer queries from the grid electric power company customer service is very different from open-domain questions: the problems questioned by customer tend to be for a specific device or system within the enterprise operation problem. Most grid electric companies provide customers with a communication platform where customers can get guidance on using equipment and the business process. The comments from communication platforms are valuable resources for answering customer questions. In our work, we use three neural network models which excavate potential answers to customer queries from comments. One of the key challenges, however, is the difficulty of matching customer questions with comments. To solve this problem, we propose a method based on deep learning to find the comments related to customer questions to generate more accurate and reliable answers. Experiments can prove that our method performed well in the customer service of grid electric power company. Full article

(This article belongs to the Special Issue Advances in Deep Learning III)

► Show Figures

Figure 1

23 pages, 7318 KiB

Open AccessArticle

Bilateral Attention U-Net with Dissimilarity Attention Gate for Change Detection on Remote Sensing Imageries

by Jongseok Lee, Wahyu Wiratama, Wooju Lee, Ismail Marzuki and Donggyu Sim

Appl. Sci. 2023, 13(4), 2485; https://doi.org/10.3390/app13042485 - 15 Feb 2023

Cited by 1 | Viewed by 1044

Abstract

This study proposes a bilateral attention U-Net with a dissimilarity attention gate (DAG) for change detection on remote sensing imageries. The proposed network is designed with a bilateral dissimilarity encoding for the DAG calculation to handle reversible input images, resulting in high detection [...] Read more.

This study proposes a bilateral attention U-Net with a dissimilarity attention gate (DAG) for change detection on remote sensing imageries. The proposed network is designed with a bilateral dissimilarity encoding for the DAG calculation to handle reversible input images, resulting in high detection rates regardless of the order of the two input images for change detection. The DAG exploits all the combinations of joint features to avoid spectral information loss fed into an attention gate on the decoder side. The effectiveness of the proposed method was evaluated on the KOMPSAT-3 satellite images dataset and the aerial change detection dataset (CDD). Its performance was better than that of conventional methods (specifically, U-Net, ATTUNet, and Modified-UNet++) as it achieved average F₁-score and kappa coefficient (KC) values of 0.68 and 66.93, respectively, for the KOMPSAT-3 dataset. For CDD, it achieved F₁-score and KC values of 0.70 and 68.74, respectively, which are also better values than those achieved by conventional methods. In addition, we found that the proposed bilateral attention U-Net can provide the same changed map regardless of whether the image order is reversed. Full article

(This article belongs to the Special Issue Advances in Deep Learning III)

► Show Figures

Figure 1

17 pages, 1722 KiB

Open AccessArticle

Combining Human Parsing with Analytical Feature Extraction and Ranking Schemes for High-Generalization Person Reidentification

by Nikita Gabdullin

Appl. Sci. 2023, 13(3), 1289; https://doi.org/10.3390/app13031289 - 18 Jan 2023

Viewed by 1318

Abstract

Person reidentification (re-ID) has been receiving increasing attention in recent years due to its importance for both science and society. Machine learning (particularly Deep Learning (DL)) has become the main re-ID tool that has allowed to achieve unprecedented accuracy levels on benchmark datasets. [...] Read more.

Person reidentification (re-ID) has been receiving increasing attention in recent years due to its importance for both science and society. Machine learning (particularly Deep Learning (DL)) has become the main re-ID tool that has allowed to achieve unprecedented accuracy levels on benchmark datasets. However, there is a known problem of poor generalization in respect of DL models. That is, models that are trained to achieve high accuracy on one dataset perform poorly on other ones and require re-training. In order to address this issue, we present a model without trainable parameters. This, in turn, results in a great potential for high generalization. This approach combines a fully analytical feature extraction and similarity ranking scheme with DL-based human parsing wherein human parsing is used to obtain the initial subregion classification. We show that such combination, to a high extent, eliminates the drawbacks of existing analytical methods. In addition, we use interpretable color and texture features that have human-readable similarity measures associated with them. In order to verify the proposed method we conduct experiments on Market1501 and CUHK03 datasets, thus achieving a competitive rank-1 accuracy comparable with that of DL models. Most importantly, we show that our method achieves 63.9% and 93.5% rank-1 cross-domain accuracy when applied to transfer learning tasks, while also being completely re-ID dataset agnostic. We also achieve a cross-domain mean average precision (mAP) that is higher than that of DL models in some experiments. Finally, we discuss the potential ways of adding new features to further improve the model. We also show the advantages of interpretable features for the purposes of constructing human-generated queries from verbal descriptions in order to conduct searches without a query image. Full article

(This article belongs to the Special Issue Advances in Deep Learning III)

► Show Figures

Figure 1

15 pages, 3112 KiB

Open AccessArticle

NGIoU Loss: Generalized Intersection over Union Loss Based on a New Bounding Box Regression

by Chenghao Tong, Xinhao Yang, Qing Huang and Feiyang Qian

Appl. Sci. 2022, 12(24), 12785; https://doi.org/10.3390/app122412785 - 13 Dec 2022

Cited by 4 | Viewed by 1851

Abstract

Loss functions, such as the IoU Loss function and the GIoU (Generalized Intersection over Union) Loss function have been put forward to replace regression loss functions commonly used in regression loss calculation. GIoU Loss alleviates the vanishing gradient in the case of the [...] Read more.

Loss functions, such as the IoU Loss function and the GIoU (Generalized Intersection over Union) Loss function have been put forward to replace regression loss functions commonly used in regression loss calculation. GIoU Loss alleviates the vanishing gradient in the case of the non-overlapping, but it will completely degenerate into the IoU Loss function when bounding boxes overlap totally, which fails to achieve the optimization effect. To solve this problem, some improvements are proposed in this paper on the basis of the GIoU Loss function, taking into account the overlap rate of complete overlap of bounding boxes. In PASCAL VOC data, the experimental results demonstrate that the AP of NGIoU Loss function in the YOLOv4 model is 47.68%, 1.15% higher than that of the GIoU Loss function, and the highest map value is 86.79% in the YOLOv5 model. Full article

(This article belongs to the Special Issue Advances in Deep Learning III)

► Show Figures

Figure 1

16 pages, 19672 KiB

Open AccessArticle

FGCM: Noisy Label Learning via Fine-Grained Confidence Modeling

by Shaotian Yan, Xiang Tian, Rongxin Jiang and Yaowu Chen

Appl. Sci. 2022, 12(22), 11406; https://doi.org/10.3390/app122211406 - 10 Nov 2022

Viewed by 1366

Abstract

A small portion of mislabeled data can easily limit the performance of deep neural networks (DNNs) due to their high capacity for memorizing random labels. Thus, robust learning from noisy labels has become a key challenge for deep learning due to inadequate datasets [...] Read more.

A small portion of mislabeled data can easily limit the performance of deep neural networks (DNNs) due to their high capacity for memorizing random labels. Thus, robust learning from noisy labels has become a key challenge for deep learning due to inadequate datasets with high-quality annotations. Most existing methods involve training models on clean sets by dividing clean samples from noisy ones, resulting in large amounts of mislabeled data being unused. To address this problem, we propose categorizing training samples into five fine-grained clusters based on the difficulty experienced by DNN models when learning them and label correctness. A novel fine-grained confidence modeling (FGCM) framework is proposed to cluster samples into these five categories; with each cluster, FGCM decides whether to accept the cluster data as they are, accept them with label correction, or accept them as unlabeled data. By applying different strategies to the fine-grained clusters, FGCM can better exploit training data than previous methods. Extensive experiments on widely used benchmarks CIFAR-10, CIFAR-100, clothing1M, and WebVision with different ratios and types of label noise demonstrate the superiority of our FGCM. Full article

(This article belongs to the Special Issue Advances in Deep Learning III)

► Show Figures

Figure 1

17 pages, 2124 KiB

Open AccessArticle

CMD-Net: Self-Supervised Category-Level 3D Shape Denoising through Canonicalization

by Caner Sahin

Appl. Sci. 2022, 12(20), 10474; https://doi.org/10.3390/app122010474 - 17 Oct 2022

Cited by 2 | Viewed by 1369

Abstract

Point clouds provide a compact representation of 3D shapes however, the imperfections in acquisition processes corrupt point clouds by noise and give rise to a decrease in their power for representing 3D shapes. Learning-based denoising methods operate displacement prediction and suffer from shrinkage [...] Read more.

Point clouds provide a compact representation of 3D shapes however, the imperfections in acquisition processes corrupt point clouds by noise and give rise to a decrease in their power for representing 3D shapes. Learning-based denoising methods operate displacement prediction and suffer from shrinkage and outliers. In addition, they require pre-aligned datasets. In this paper, we present a self-supervised learning-based method, Canonical Mapping and Denoising Network (CMD-Net), and address category-level 3D shape denoising through canonicalization. We formulate denoising as a 3D semantic shape correspondence estimation task where we explore ordered 3D intrinsic structure points. Utilizing the convex hull of the explored structure points, the corruption on objects’ surfaces is eliminated. Our method is capable of canonicalizing noise-corrupted clouds under arbitrary rotations, therefore circumventing the requirement on pre-aligned data. The complete model learns to canonicalize the input through a novel transformer that serves as a proxy in the downstream denoising task. The analyses on the experiments validate the promising performance of the presented method on both synthetic and real data. We show that our method can not only eliminate corruption, but also remove clutter from the test data. We additionally create a novel dataset for the problem in hand and will make it publicly available in our project web-page. Full article

(This article belongs to the Special Issue Advances in Deep Learning III)

► Show Figures

Figure 1

26 pages, 7222 KiB

Open AccessArticle

Research on Improved Deep Convolutional Generative Adversarial Networks for Insufficient Samples of Gas Turbine Rotor System Fault Diagnosis

by Shucong Liu, Hongjun Wang and Xiang Zhang

Appl. Sci. 2022, 12(7), 3606; https://doi.org/10.3390/app12073606 - 01 Apr 2022

Cited by 2 | Viewed by 1817

Abstract

In gas turbine rotor systems, an intelligent data-driven fault diagnosis method is an important means to monitor the health status of the gas turbine, and it is necessary to obtain sufficient fault data to train the intelligent diagnosis model. In the actual operation [...] Read more.

In gas turbine rotor systems, an intelligent data-driven fault diagnosis method is an important means to monitor the health status of the gas turbine, and it is necessary to obtain sufficient fault data to train the intelligent diagnosis model. In the actual operation of a gas turbine, the collected gas turbine fault data are limited, and the small and imbalanced fault samples seriously affect the accuracy of the fault diagnosis method. Focusing on the imbalance of gas turbine fault data, an Improved Deep Convolutional Generative Adversarial Network (Improved DCGAN) suitable for gas turbine signals is proposed here, and a structural optimization of the generator and a gradient penalty improvement in the loss function are introduced to generate effective fault data and improve the classification accuracy. The experimental results of the gas turbine test bench demonstrate that the proposed method can generate effective fault samples as a supplementary set of fault samples to balance the dataset, effectively improve the fault classification and diagnosis performance of gas turbine rotors in the case of small samples, and provide an effective method for gas turbine fault diagnosis. Full article

(This article belongs to the Special Issue Advances in Deep Learning III)

► Show Figures

Figure 1

11 pages, 2106 KiB

Open AccessArticle

Late Fusion-Based Video Transformer for Facial Micro-Expression Recognition

by Jiuk Hong, Chaehyeon Lee and Heechul Jung

Appl. Sci. 2022, 12(3), 1169; https://doi.org/10.3390/app12031169 - 23 Jan 2022

Cited by 8 | Viewed by 2438

Abstract

In this article, we propose a novel model for facial micro-expression (FME) recognition. The proposed model basically comprises a transformer, which is recently used for computer vision and has never been used for FME recognition. A transformer requires a huge amount of data [...] Read more.

In this article, we propose a novel model for facial micro-expression (FME) recognition. The proposed model basically comprises a transformer, which is recently used for computer vision and has never been used for FME recognition. A transformer requires a huge amount of data compared to a convolution neural network. Then, we use motion features, such as optical flow and late fusion to complement the lack of FME dataset. The proposed method was verified and evaluated using the SMIC and CASME II datasets. Our approach achieved state-of-the-art (SOTA) performance of 0.7447 and 73.17% in SMIC in terms of unweighted F1 score (UF1) and accuracy (Acc.), respectively, which are 0.31 and 1.8% higher than previous SOTA. Furthermore, UF1 of 0.7106 and Acc. of 70.68% were shown in the CASME II experiment, which are comparable with SOTA. Full article

(This article belongs to the Special Issue Advances in Deep Learning III)

► Show Figures

Figure 1

16 pages, 5840 KiB

Open AccessArticle

Integrating Image Quality Enhancement Methods and Deep Learning Techniques for Remote Sensing Scene Classification

by Sheng-Chieh Hung, Hui-Ching Wu and Ming-Hseng Tseng

Appl. Sci. 2021, 11(24), 11659; https://doi.org/10.3390/app112411659 - 08 Dec 2021

Cited by 6 | Viewed by 2753

Abstract

Through the continued development of technology, applying deep learning to remote sensing scene classification tasks is quite mature. The keys to effective deep learning model training are model architecture, training strategies, and image quality. From previous studies of the author using explainable artificial [...] Read more.

Through the continued development of technology, applying deep learning to remote sensing scene classification tasks is quite mature. The keys to effective deep learning model training are model architecture, training strategies, and image quality. From previous studies of the author using explainable artificial intelligence (XAI), image cases that have been incorrectly classified can be improved when the model has adequate capacity to correct the classification after manual image quality correction; however, the manual image quality correction process takes a significant amount of time. Therefore, this research integrates technologies such as noise reduction, sharpening, partial color area equalization, and color channel adjustment to evaluate a set of automated strategies for enhancing image quality. These methods can enhance details, light and shadow, color, and other image features, which are beneficial for extracting image features from the deep learning model to further improve the classification efficiency. In this study, we demonstrate that the proposed image quality enhancement strategy and deep learning techniques can effectively improve the scene classification performance of remote sensing images and outperform previous state-of-the-art approaches. Full article

(This article belongs to the Special Issue Advances in Deep Learning III)

► Show Figures

Figure 1

13 pages, 17559 KiB

Open AccessArticle

A Deep Learning Architecture for 3D Mapping Urban Landscapes

by Armando Levid Rodríguez-Santiago, José Aníbal Arias-Aguilar, Hiroshi Takemura and Alberto Elías Petrilli-Barceló

Appl. Sci. 2021, 11(23), 11551; https://doi.org/10.3390/app112311551 - 06 Dec 2021

Viewed by 1958

Abstract

In this paper, an approach through a Deep Learning architecture for the three-dimensional reconstruction of outdoor environments in challenging terrain conditions is presented. The architecture proposed is configured as an Autoencoder. However, instead of the typical convolutional layers, some differences are proposed. The [...] Read more.

In this paper, an approach through a Deep Learning architecture for the three-dimensional reconstruction of outdoor environments in challenging terrain conditions is presented. The architecture proposed is configured as an Autoencoder. However, instead of the typical convolutional layers, some differences are proposed. The Encoder stage is set as a residual net with four residual blocks, which have been provided with the necessary knowledge to extract the feature maps from aerial images of outdoor environments. On the other hand, the Decoder stage is set as a Generative Adversarial Network (GAN) and called a GAN-Decoder. The proposed network architecture uses a sequence of the 2D aerial image as input. The Encoder stage works for the extraction of the vector of features that describe the input image, while the GAN-Decoder generates a point cloud based on the information obtained in the previous stage. By supplying a sequence of frames that a percentage of overlap between them, it is possible to determine the spatial location of each generated point. The experiments show that with this proposal it is possible to perform a 3D representation of an area flown over by a drone using the point cloud generated with a deep architecture that has a sequence of aerial 2D images as input. In comparison with other works, our proposed system is capable of performing three-dimensional reconstructions in challenging urban landscapes. Compared with the results obtained using commercial software, our proposal was able to generate reconstructions in less processing time, with less overlapping percentage between 2D images and is invariant to the type of flight path. Full article

(This article belongs to the Special Issue Advances in Deep Learning III)

► Show Figures

Figure 1

9 pages, 444 KiB

Open AccessArticle

P-Norm Attention Deep CORAL: Extending Correlation Alignment Using Attention and the P-Norm Loss Function

by Zhi-Yong Wang and Dae-Ki Kang

Appl. Sci. 2021, 11(11), 5267; https://doi.org/10.3390/app11115267 - 06 Jun 2021

Cited by 2 | Viewed by 2013

Abstract

CORrelation ALignment (CORAL) is an unsupervised domain adaptation method that uses a linear transformation to align the covariances of source and target domains. Deep CORAL extends CORAL with a nonlinear transformation using a deep neural network and adds CORAL loss as a part [...] Read more.

CORrelation ALignment (CORAL) is an unsupervised domain adaptation method that uses a linear transformation to align the covariances of source and target domains. Deep CORAL extends CORAL with a nonlinear transformation using a deep neural network and adds CORAL loss as a part of the total loss to align the covariances of source and target domains. However, there are still two problems to be solved in Deep CORAL: features extracted from AlexNet are not always a good representation of the original data, as well as joint training combined with both the classification and CORAL loss may not be efficient enough to align the distribution of the source and target domain. In this paper, we proposed two strategies: attention to improve the quality of feature maps and the p-norm loss function to align the distribution of the source and target features, further reducing the offset caused by the classification loss function. Experiments on the Office-31 dataset indicate that our proposed methodologies improved Deep CORAL in terms of performance. Full article

(This article belongs to the Special Issue Advances in Deep Learning III)

► Show Figures

Figure 1

22 pages, 5188 KiB

Open AccessArticle

A Methodology for Utilizing Vector Space to Improve the Performance of a Dog Face Identification Model

by Bohan Yoon, Hyeonji So and Jongtae Rhee

Appl. Sci. 2021, 11(5), 2074; https://doi.org/10.3390/app11052074 - 26 Feb 2021

Cited by 4 | Viewed by 2573

Abstract

Recent improvements in the performance of the human face recognition model have led to the development of relevant products and services. However, research in the similar field of animal face identification has remained relatively limited due to the greater diversity and complexity in [...] Read more.

Recent improvements in the performance of the human face recognition model have led to the development of relevant products and services. However, research in the similar field of animal face identification has remained relatively limited due to the greater diversity and complexity in shape and the lack of relevant data for animal faces such as dogs. In the face identification model using triplet loss, the length of the embedding vector is normalized by adding an L2-normalization (L2-norm) layer for using cosine-similarity-based learning. As a result, object identification depends only on the angle, and the distribution of the embedding vector is limited to the surface of a sphere with a radius of 1. This study proposes training the model from which the L2-norm layer is removed by using the triplet loss to utilize a wide vector space beyond the surface of a sphere with a radius of 1, for which a novel loss function and its two-stage learning method. The proposed method classifies the embedding vector within a space rather than on the surface, and the model’s performance is also increased. The accuracy, one-shot identification performance, and distribution of the embedding vectors are compared between the existing learning method and the proposed learning method for verification. The verification was conducted using an open-set. The resulting accuracy of 97.33% for the proposed learning method is approximately 4% greater than that of the existing learning method. Full article

(This article belongs to the Special Issue Advances in Deep Learning III)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Advances in Deep Learning III

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Published Papers (13 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI