Deep Learning Architectures for Computer Vision

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (20 September 2023) | Viewed by 13979

Special Issue Editor


E-Mail Website
Guest Editor
Department of Informatics and Telecommunications, University of Thessaly, 15341 Lamia, Greece
Interests: semantic multimedia analysis; indexing and retrieval; low-level feature extraction and modeling; visual context modeling; multimedia content representation; neural networks; intelligent systems; biomedical image analysis; social generated data analysis and the Internet of Things

Special Issue Information

Dear Colleagues,

During the last several years, deep learning neural network architectures have been shown to outperform traditional machine learning approaches in a plethora of tasks. One of the most important research areas that has benefited from these architectures is computer vision. With the continuous growth in the size and number of visual data sets, deep models can be trained and applied to several tasks, demonstrating stronger performance than humans.

The goal of this Special Issue is to highlight the latest developments in the broader area of deep learning focusing on computer vision applications that are based on deep learning neural network architectures such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), generative adversarial networks (GANs) etc.

Topics of interest to this Special Issue include but are not limited to:

  • Scene understanding;
  • 3D visual perception;
  • Human analysis and modeling;
  • Feature learning and representation;
  • Image/video understanding;
  • Video summarization;
  • Remote sensing image analysis;
  • Object detection and tracking;
  • Image processing;
  • Image segmentation;
  • Medical image/video analysis;
  • Affective computing;
  • Industrial applications.

Dr. Evaggelos Spyrou
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • deep learning
  • computer vision
  • scene understanding
  • visual perception
  • human analysis and modeling
  • image/video processing and analysis
  • object detection/tracking

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

20 pages, 16585 KiB  
Article
An Improved YOLO Model for UAV Fuzzy Small Target Image Detection
by Yanlong Chang, Dong Li, Yunlong Gao, Yun Su and Xiaoqiang Jia
Appl. Sci. 2023, 13(9), 5409; https://doi.org/10.3390/app13095409 - 26 Apr 2023
Cited by 7 | Viewed by 1637
Abstract
High-altitude UAV photography presents several challenges, including blurry images, low image resolution, and small targets, which can cause low detection performance of existing object detection algorithms. Therefore, this study proposes an improved small-object detection algorithm based on the YOLOv5s computer vision model. First, [...] Read more.
High-altitude UAV photography presents several challenges, including blurry images, low image resolution, and small targets, which can cause low detection performance of existing object detection algorithms. Therefore, this study proposes an improved small-object detection algorithm based on the YOLOv5s computer vision model. First, the original convolution in the network framework was replaced with the SPD-Convolution module to eliminate the impact of pooling operations on feature information and to enhance the model’s capability to extract features from low-resolution and small targets. Second, a coordinate attention mechanism was added after the convolution operation to improve model detection accuracy with small targets under image blurring. Third, the nearest-neighbor interpolation in the original network upsampling was replaced with transposed convolution to increase the receptive field range of the neck and reduce detail loss. Finally, the CIoU loss function was replaced with the Alpha-IoU loss function to solve the problem of the slow convergence of gradients during training on small target images. Using the images of Artemisia salina, taken in Hunshandake sandy land in China, as a dataset, the experimental results demonstrated that the proposed algorithm provides significantly improved results (average precision = 80.17%, accuracy = 73.45% and recall rate = 76.97%, i.e., improvements by 14.96%, 6.24%, and 7.21%, respectively, compared with the original model) and also outperforms other detection algorithms. The detection of small objects and blurry images has been significantly improved. Full article
(This article belongs to the Special Issue Deep Learning Architectures for Computer Vision)
Show Figures

Figure 1

22 pages, 4130 KiB  
Article
Multiplicative Vector Fusion Model for Detecting Deepfake News in Social Media
by Yalamanchili Salini and Jonnadula Harikiran
Appl. Sci. 2023, 13(7), 4207; https://doi.org/10.3390/app13074207 - 26 Mar 2023
Cited by 3 | Viewed by 1225
Abstract
In the digital age, social media platforms are becoming vital tools for generating and detecting deepfake news due to the rapid dissemination of information. Unfortunately, today, fake news is being developed at an accelerating rate that can cause substantial problems, such as early [...] Read more.
In the digital age, social media platforms are becoming vital tools for generating and detecting deepfake news due to the rapid dissemination of information. Unfortunately, today, fake news is being developed at an accelerating rate that can cause substantial problems, such as early detection of fake news, a lack of labelled data available for training, and identifying fake news instances that still need to be discovered. Identifying false news requires an in-depth understanding of authors, entities, and the connections between words in a long text. Unfortunately, many deep learning (DL) techniques have proven ineffective with lengthy texts to address these issues. This paper proposes a TL-MVF model based on transfer learning for detecting and generating deepfake news in social media. To generate the sentences, the T5, or Text-to-Text Transfer Transformer model, was employed for data cleaning and feature extraction. In the next step, we designed an optimal hyperparameter RoBERTa model for effectively detecting fake and real news. Finally, we propose a multiplicative vector fusion model for classifying fake news from real news efficiently. A real-time and benchmarked dataset was used to test and validate the proposed TL-MVF model. For the TL-MVF model, F-score, accuracy, precision, recall, and AUC were performance evaluation measures. As a result, the proposed TL-MVF performed better than existing benchmarks. Full article
(This article belongs to the Special Issue Deep Learning Architectures for Computer Vision)
Show Figures

Figure 1

14 pages, 2533 KiB  
Article
Lightweight Micro-Expression Recognition on Composite Database
by Nur Aishah Ab Razak and Shahnorbanun Sahran
Appl. Sci. 2023, 13(3), 1846; https://doi.org/10.3390/app13031846 - 31 Jan 2023
Viewed by 1297
Abstract
The potential of leveraging micro-expression in various areas such as security, health care and education has intensified interests in this area. Unlike facial expression, micro-expression is subtle and occurs rapidly, making it imperceptible. Micro-expression recognition (MER) on composite dataset following Micro-Expression Grand Challenge [...] Read more.
The potential of leveraging micro-expression in various areas such as security, health care and education has intensified interests in this area. Unlike facial expression, micro-expression is subtle and occurs rapidly, making it imperceptible. Micro-expression recognition (MER) on composite dataset following Micro-Expression Grand Challenge 2019 protocol is an ongoing research area with challenges stemming from demographic variety of the samples as well as small and imbalanced dataset. However, most micro-expression recognition (MER) approaches today are complex and require computationally expensive pre-processing but result in average performance. This work will demonstrate how transfer learning from a larger and varied macro-expression database (FER 2013) in a lightweight deep learning network before fine-tuning on the composite dataset can achieve high MER performance using only static images as input. The imbalanced dataset problem is redefined as an algorithm tuning problem instead of data engineering and generation problem to lighten the pre-processing steps. The proposed MER model is developed from truncated EfficientNet-B0 model consisting of 15 layers with only 867k parameters. A simple algorithm tuning that manipulates the loss function to place more importance on minority classes is suggested to deal with the imbalanced dataset. Experimental results using Leave-One-Subject-Out cross-validation on the composite dataset show substantial performance increase compared to the state-of-the-art models. Full article
(This article belongs to the Special Issue Deep Learning Architectures for Computer Vision)
Show Figures

Figure 1

18 pages, 4569 KiB  
Article
Unsupervised Semantic Segmentation Inpainting Network Using a Generative Adversarial Network with Preprocessing
by Woo-Jin Ahn, Dong-Won Kim, Tae-Koo Kang, Dong-Sung Pae and Myo-Taeg Lim
Appl. Sci. 2023, 13(2), 781; https://doi.org/10.3390/app13020781 - 05 Jan 2023
Viewed by 1766
Abstract
The generative adversarial neural network has shown a novel result in the image generation area. However, applying it to a semantic segmentation inpainting task exhibits instability due to the different data distribution. To solve this problem, we propose an unsupervised semantic segmentation inpainting [...] Read more.
The generative adversarial neural network has shown a novel result in the image generation area. However, applying it to a semantic segmentation inpainting task exhibits instability due to the different data distribution. To solve this problem, we propose an unsupervised semantic segmentation inpainting method using an adversarial deep neural network with a newly introduced preprocessing method and loss function. For stabilizing the adversarial training for semantic segmentation inpainting, we match the probability distribution of the segmentation maps with the developed preprocessing method. In addition, a new cross-entropy total variation loss for the probability map is introduced to improve the segmentation inpainting work by smoothing the segmentation map. The experimental results demonstrate the proposed algorithm’s effectiveness on both synthetic and real datasets. Full article
(This article belongs to the Special Issue Deep Learning Architectures for Computer Vision)
Show Figures

Figure 1

26 pages, 3258 KiB  
Article
Depth-Adaptive Deep Neural Network Based on Learning Layer Relevance Weights
by Arwa Alturki, Ouiem Bchir and Mohamed Maher Ben Ismail
Appl. Sci. 2023, 13(1), 398; https://doi.org/10.3390/app13010398 - 28 Dec 2022
Viewed by 1935
Abstract
In this paper, we propose two novel Adaptive Neural Network Approaches (ANNAs), which are intended to automatically learn the optimal network depth. In particular, the proposed class-independent and class-dependent ANNAs address two main challenges faced by typical deep learning paradigms. Namely, they overcome [...] Read more.
In this paper, we propose two novel Adaptive Neural Network Approaches (ANNAs), which are intended to automatically learn the optimal network depth. In particular, the proposed class-independent and class-dependent ANNAs address two main challenges faced by typical deep learning paradigms. Namely, they overcome the problems of setting the optimal network depth and improving the model interpretability. Specifically, ANNA approaches simultaneously train the network model, learn the network depth in an unsupervised manner, and assign fuzzy relevance weights to each network layer to better decipher the model behavior. In addition, two novel cost functions were designed in order to optimize the layer fuzzy relevance weights along with the model hyper-parameters. The proposed ANNA approaches were assessed using standard benchmarking datasets and performance measures. The experiments proved their effectiveness compared to typical deep learning approaches, which rely on empirical tuning and scaling of the network depth. Moreover, the experimental findings demonstrated the ability of the proposed class-independent and class-dependent ANNAs to decrease the network complexity and build lightweight models for less overfitting risk and better generalization. Full article
(This article belongs to the Special Issue Deep Learning Architectures for Computer Vision)
Show Figures

Figure 1

36 pages, 11108 KiB  
Article
On the Relative Impact of Optimizers on Convolutional Neural Networks with Varying Depth and Width for Image Classification
by Eustace M. Dogo, Oluwatobi J. Afolabi and Bhekisipho Twala
Appl. Sci. 2022, 12(23), 11976; https://doi.org/10.3390/app122311976 - 23 Nov 2022
Cited by 6 | Viewed by 2834
Abstract
The continued increase in computing resources is one key factor that is allowing deep learning researchers to scale, design and train new and complex convolutional neural network (CNN) architectures in terms of varying width, depth, or both width and depth to improve performance [...] Read more.
The continued increase in computing resources is one key factor that is allowing deep learning researchers to scale, design and train new and complex convolutional neural network (CNN) architectures in terms of varying width, depth, or both width and depth to improve performance for a variety of problems. The contributions of this study include an uncovering of how different optimization algorithms impact CNN architectural setups with variations in width, depth, and both width/depth. Specifically in this study, three different CNN architectural setups in combination with nine different optimization algorithms—namely SGD vanilla, with momentum, and with Nesterov momentum, RMSProp, ADAM, ADAGrad, ADADelta, ADAMax, and NADAM—are trained and evaluated using three publicly available benchmark image classification datasets. Through extensive experimentation, we analyze the output predictions of the different optimizers with the CNN architectures using accuracy, convergence speed, and loss function as performance metrics. Findings based on the overall results obtained across the three image classification datasets show that ADAM and NADAM achieved superior performances with wider and deeper/wider setups, respectively, while ADADelta was the worst performer, especially with the deeper CNN architectural setup. Full article
(This article belongs to the Special Issue Deep Learning Architectures for Computer Vision)
Show Figures

Figure 1

14 pages, 721 KiB  
Article
Extending Partial Domain Adaptation Algorithms to the Open-Set Setting
by George Pikramenos, Evaggelos Spyrou and Stavros J. Perantonis
Appl. Sci. 2022, 12(19), 10052; https://doi.org/10.3390/app121910052 - 06 Oct 2022
Viewed by 920
Abstract
Partial domain adaptation (PDA) is a framework for mitigating the covariate shift problem when target labels are contained in source labels. For this task, adversarial neural network (ANN) methods proposed in the literature have been proven to be flexible and effective. In this [...] Read more.
Partial domain adaptation (PDA) is a framework for mitigating the covariate shift problem when target labels are contained in source labels. For this task, adversarial neural network (ANN) methods proposed in the literature have been proven to be flexible and effective. In this work, we adapt such methods to tackle the more general problem of open-set domain adaptation (OSDA), which further allows the existence of target instances with labels outside the source labels. The aim in OSDA is to mitigate the covariate shift problem and to identify target instances with labels outside the source label space. We show that the effectiveness of ANN methods utilized in the PDA setting is hindered by outlier target instances, and we propose an adaptation for effective OSDA. Full article
(This article belongs to the Special Issue Deep Learning Architectures for Computer Vision)
Show Figures

Figure 1

Review

Jump to: Research

18 pages, 7191 KiB  
Review
A Deep Learning Review of ResNet Architecture for Lung Disease Identification in CXR Image
by Syifa Auliyah Hasanah, Anindya Apriliyanti Pravitasari, Atje Setiawan Abdullah, Intan Nurma Yulita and Mohammad Hamid Asnawi
Appl. Sci. 2023, 13(24), 13111; https://doi.org/10.3390/app132413111 - 08 Dec 2023
Cited by 3 | Viewed by 1521
Abstract
The lungs are two of the most crucial organs in the human body because they are connected to the respiratory and circulatory systems. Lung cancer, COVID-19, pneumonia, and other severe diseases are just a few of the many threats. The patient is subjected [...] Read more.
The lungs are two of the most crucial organs in the human body because they are connected to the respiratory and circulatory systems. Lung cancer, COVID-19, pneumonia, and other severe diseases are just a few of the many threats. The patient is subjected to an X-ray examination to evaluate the health of their lungs. A radiologist must interpret the X-ray results. The rapid advancement of technology today can help people in many different ways. One use of deep learning in the health industry is in the detection of diseases, which can decrease the amount of money, time, and energy needed while increasing effectiveness and efficiency. There are other methods that can be used, but in this research, the convolutional neural network (CNN) method is only used with three architectures, namely ResNet-50, ResNet-101, and ResNet-152, to aid radiologists in identifying lung diseases in patients. The 21,885 images that make up the dataset for this study are split into four groups: COVID-19, pneumonia, lung opacity, and normal. The three algorithms have fairly high evaluation scores per the experiment results. F1 scores of 91%, 93%, and 94% are assigned to the ResNet-50, ResNet-101, and ResNet-152 architectures, respectively. Therefore, it is advised to use the ResNet-152 architecture, which has better performance values than the other two designs in this study, to categorize lung diseases experienced by patients. Full article
(This article belongs to the Special Issue Deep Learning Architectures for Computer Vision)
Show Figures

Figure 1

Back to TopTop