Deep Learning for Computer Vision Application

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: 1 August 2024 | Viewed by 6074

Special Issue Editor


E-Mail Website
Guest Editor
Research Officer (AI/ML Expert), Construction Research Centre, National Research Council Canada, Ottawa, ON K1A 0R6, Canada
Interests: computer vision; image processing; artificial intelligence; deep learning; medical imaging; thermal imaging; spectroscopy; virtual reality; data analytics and risk assessment; electronics/embedded systems
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Artificial intelligence (AI) methods, and more specifically deep neural networks (also called deep learning models), have became the core technique for computer vision tasks across various applications. The advent of these powerful deep learning models allows state-of-the-art automation levels in autonomous pattern recognition from image data. In a general sense, the ultimate manifestation of these techniques can be seen in our daily life, from automatically sorting and retrieving photos in Google photos to autonomous cars. However, these powerful techniques still have not been utilized in all computer vision tasks. Future studies should seek to find more applications of AI in our life, e.g., via data acquisition and cleaning, as well as more model optimization, innovation, and research. In this Special Issue, we are particularly interested in new applications of deep learning in the computer vision field.

Topics of interest include but are not limited to:

  • Image classification using deep learning;
  • Object detection using deep learning;
  • Semantic and instant segmentation using deep learning;
  • Deep learning techniques for generating new images (generative adversarial networks);
  • Employing reinforcement learning for computer vision tasks;
  • Application of deep learning in the Internet of Things (IoT);
  • Application of deep learning in embedded systems, sensor development, and electronics;
  • Computer vision tasks using deep learning (medical image processing, remote sensing, hyperspectral imaging, thermal imaging, space and extraterrestrial observations);
  • Image sequence analysis using deep learning;
  • Deep learning and computer vision for smart and green building, smart industry, and smart devices.

Dr. Hamed Mozaffari
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • convolutional neural network
  • deep learning
  • computer vision
  • artificial intelligence
  • image processing
  • medical image processing
  • internet of things
  • thermal imaging
  • image technologies
  • application of deep learning
  • autonomous vehicles
  • image classification
  • object detection
  • and object segmentation

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

18 pages, 89225 KiB  
Article
Graph Attention Networks and Track Management for Multiple Object Tracking
by Yajuan Zhang, Yongquan Liang, Ahmed Elazab, Zhihui Wang and Changmiao Wang
Electronics 2023, 12(19), 4079; https://doi.org/10.3390/electronics12194079 - 28 Sep 2023
Cited by 1 | Viewed by 1110
Abstract
Multiple object tracking (MOT) constitutes a critical research area within the field of computer vision. The creation of robust and efficient systems, which can approximate the mechanisms of human vision, is essential to enhance the efficacy of multiple object-tracking techniques. However, obstacles such [...] Read more.
Multiple object tracking (MOT) constitutes a critical research area within the field of computer vision. The creation of robust and efficient systems, which can approximate the mechanisms of human vision, is essential to enhance the efficacy of multiple object-tracking techniques. However, obstacles such as repetitive target appearances and frequent occlusions cause considerable inaccuracies or omissions in detection. Following the updating of these inaccurate observations into the tracklet, the effectiveness of the tracking model, employing appearance features, declines significantly. This paper introduces a novel method of multiple object tracking, employing graph attention networks and track management (GATM). Utilizing a graph attention network, an attention mechanism is employed to capture the relationships of nodes within the graph as well as node-to-node correlations across graphs. This mechanism allows selective focus on the features of advantageous nodes and enhances discriminability between node features, subsequently improving the performance and robustness of multiple object tracking. Simultaneously, we categorize distinct tracklet states and introduce an efficient track management method, which employs varying processing techniques for tracklets in diverse states. This method can manage occluded tracks in crowded scenes and improves tracking accuracy. Experiments conducted on three challenging public datasets (MOT16, MOT17, and MOT20) demonstrate that our method could deliver competitive performance. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision Application)
Show Figures

Figure 1

14 pages, 3711 KiB  
Article
VHR-BirdPose: Vision Transformer-Based HRNet for Bird Pose Estimation with Attention Mechanism
by Runang He, Xiaomin Wang, Huazhen Chen and Chang Liu
Electronics 2023, 12(17), 3643; https://doi.org/10.3390/electronics12173643 - 29 Aug 2023
Viewed by 1224
Abstract
Pose estimation plays a crucial role in recognizing and analyzing the postures, actions, and movements of humans and animals using computer vision and machine learning techniques. However, bird pose estimation encounters specific challenges, including bird diversity, posture variation, and the fine granularity of [...] Read more.
Pose estimation plays a crucial role in recognizing and analyzing the postures, actions, and movements of humans and animals using computer vision and machine learning techniques. However, bird pose estimation encounters specific challenges, including bird diversity, posture variation, and the fine granularity of posture. To overcome these challenges, we propose VHR-BirdPose, a method that combines Vision Transformer (ViT) and Deep High-Resolution Network (HRNet) with an attention mechanism. VHR-BirdPose effectively extracts features using Vision Transformer’s self-attention mechanism, which captures global dependencies in the images and allows for better capturing of pose details and changes. The attention mechanism is employed to enhance the focus on bird keypoints, improving the accuracy of pose estimation. By combining HRNet with Vision Transformer, our model can extract multi-scale features while maintaining high-resolution details and incorporating richer semantic information through the attention mechanism. This integration of HRNet and Vision Transformer leverages the advantages of both models, resulting in accurate and robust bird pose estimation. We conducted extensive experiments on the Animal Kingdom dataset to evaluate the performance of VHR-BirdPose. The results demonstrate that our proposed method achieves state-of-the-art performance in bird pose estimation. VHR-BirdPose based on bird images is of great significance for the advancement of bird behaviors, ecological understanding, and the protection of bird populations. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision Application)
Show Figures

Figure 1

20 pages, 10877 KiB  
Article
SGooTY: A Scheme Combining the GoogLeNet-Tiny and YOLOv5-CBAM Models for Nüshu Recognition
by Yan Zhang and Liumei Zhang
Electronics 2023, 12(13), 2819; https://doi.org/10.3390/electronics12132819 - 26 Jun 2023
Cited by 1 | Viewed by 1092
Abstract
With the development of society, the intangible cultural heritage of Chinese Nüshu is in danger of extinction. To promote the research and popularization of traditional Chinese culture, we use deep learning to automatically detect and recognize handwritten Nüshu characters. To address difficulties such [...] Read more.
With the development of society, the intangible cultural heritage of Chinese Nüshu is in danger of extinction. To promote the research and popularization of traditional Chinese culture, we use deep learning to automatically detect and recognize handwritten Nüshu characters. To address difficulties such as the creation of a Nüshu character dataset, uneven samples, and difficulties in character recognition, we first build a large-scale handwritten Nüshu character dataset, HWNS2023, by using various data augmentation methods. This dataset contains 5500 Nüshu images and 1364 labeled character samples. Second, in this paper, we propose a two-stage scheme model combining GoogLeNet-tiny and YOLOv5-CBAM (SGooTY) for Nüshu recognition. In the first stage, five basic deep learning models including AlexNet, VGGNet16, GoogLeNet, MobileNetV3, and ResNet are trained and tested on the dataset, and the model structure is improved to enhance the accuracy of recognising handwritten Nüshu characters. In the second stage, we combine an object detection model to re-recognize misidentified handwritten Nüshu characters to ensure the accuracy of the overall system. Experimental results show that in the first stage, the improved model achieves the highest accuracy of 99.3% in recognising Nüshu characters, which significantly improves the recognition rate of handwritten Nüshu characters. After integrating the object recognition model, the overall recognition accuracy of the model reached 99.9%. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision Application)
Show Figures

Figure 1

20 pages, 9967 KiB  
Article
CNN-Based Fluid Motion Estimation Using Correlation Coefficient and Multiscale Cost Volume
by Jun Chen, Hui Duan, Yuanxin Song, Ming Tang and Zemin Cai
Electronics 2022, 11(24), 4159; https://doi.org/10.3390/electronics11244159 - 13 Dec 2022
Cited by 1 | Viewed by 1904
Abstract
Motion estimation for complex fluid flows via their image sequences is a challenging issue in computer vision. It plays a significant role in scientific research and engineering applications related to meteorology, oceanography, and fluid mechanics. In this paper, we introduce a novel convolutional [...] Read more.
Motion estimation for complex fluid flows via their image sequences is a challenging issue in computer vision. It plays a significant role in scientific research and engineering applications related to meteorology, oceanography, and fluid mechanics. In this paper, we introduce a novel convolutional neural network (CNN)-based motion estimator for complex fluid flows using multiscale cost volume. It uses correlation coefficients as the matching costs, which can improve the accuracy of motion estimation by enhancing the discrimination of the feature matching and overcoming the feature distortions caused by the changes of fluid shapes and illuminations. Specifically, it first generates sparse seeds by a feature extraction network. A correlation pyramid is then constructed for all pairs of sparse seeds, and the predicted matches are iteratively updated through a recurrent neural network, which lookups a multi-scale cost volume from a correlation pyramid via a multi-scale search scheme. Then it uses the searched multi-scale cost volume, the current matches, and the context features as the input features to correlate the predicted matches. Since the multi-scale cost volume contains motion information for both large and small displacements, it can recover small-scale motion structures. However, the predicted matches are sparse, so the final flow field is computed by performing a CNN-based interpolation for these sparse matches. The experimental results show that our method significantly outperforms the current motion estimators in capturing different motion patterns in complex fluid flows, especially in recovering some small-scale vortices. It also achieves state-of-the-art evaluation results on the public fluid datasets and successfully captures the storms in Jupiter’s White Ovals from the remote sensing images. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision Application)
Show Figures

Figure 1

Back to TopTop