Deep Learning in Image Analysis and Pattern Recognition

A special issue of Machine Learning and Knowledge Extraction (ISSN 2504-4990). This special issue belongs to the section "Visualization".

Deadline for manuscript submissions: closed (30 August 2023) | Viewed by 23455

Special Issue Editors


E-Mail Website
Guest Editor
School of Computer Science and Technology, Harbin Institute of Technology, Weihai 264209, China
Interests: machine Learning; data mining; medical informatics; bioinformatics and service computing

E-Mail Website
Guest Editor
School of Computer Science, University of Technology Sydney, Sydney 2007, Australia
Interests: graph mining; multimodal learning; time series analysis; signal processing; recommender systems

Special Issue Information

Dear Colleagues,

In the past decade, deep learning has demonstrated state-of-the-art performances in many image processing tasks, such as image classification, object detection, object tracking, image segmentation and so on. Despite remarkable deep-learning-related achievements in computer vision, many challenging tasks and scenarios calling for novel methods and theories remain. For example, lightweight object detection is required in many applications, such as autonomous driving scenarios; however, the discrepancy between the speed of a machine and human eyes remains large. Fine-grained or small object detection constitutes another area of improvement. Weakly supervised object classification or detection is an important problem since the annotation process is time-consuming, expensive and inefficient. In addition, although deep learning achievements in computer vision have been successfully applied to many areas, greater efforts should be made to ensure this technology can better serve humans in the future.

This Special Issue call for papers detailing new advances in deep learning methods or applications in image analysis and pattern recognition. Topics of interest include, but are not limited to:

  • Image classification;
  • Object detection;
  • Object tracking;
  • Image segmentation;
  • Convolutional neural networks;
  • Diffusion model;
  • Image captioning;
  • Image clustering;
  • Representation learning for images;
  • Medical image processing;
  • Remote sensing image processing;
  • 3D image building.

Dr. Guoqing Chao
Dr. Xianzhi Wang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Machine Learning and Knowledge Extraction is an international peer-reviewed open access quarterly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine learning
  • deep learning
  • image analysis
  • pattern recognition

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

19 pages, 9171 KiB  
Article
Similarity-Based Framework for Unsupervised Domain Adaptation: Peer Reviewing Policy for Pseudo-Labeling
by Joel Arweiler, Cihan Ates, Jesus Cerquides, Rainer Koch and Hans-Jörg Bauer
Mach. Learn. Knowl. Extr. 2023, 5(4), 1474-1492; https://doi.org/10.3390/make5040074 - 12 Oct 2023
Viewed by 1395
Abstract
The inherent dependency of deep learning models on labeled data is a well-known problem and one of the barriers that slows down the integration of such methods into different fields of applied sciences and engineering, in which experimental and numerical methods can easily [...] Read more.
The inherent dependency of deep learning models on labeled data is a well-known problem and one of the barriers that slows down the integration of such methods into different fields of applied sciences and engineering, in which experimental and numerical methods can easily generate a colossal amount of unlabeled data. This paper proposes an unsupervised domain adaptation methodology that mimics the peer review process to label new observations in a different domain from the training set. The approach evaluates the validity of a hypothesis using domain knowledge acquired from the training set through a similarity analysis, exploring the projected feature space to examine the class centroid shifts. The methodology is tested on a binary classification problem, where synthetic images of cubes and cylinders in different orientations are generated. The methodology improves the accuracy of the object classifier from 60% to around 90% in the case of a domain shift in physical feature space without human labeling. Full article
(This article belongs to the Special Issue Deep Learning in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

18 pages, 17816 KiB  
Article
Mssgan: Enforcing Multiple Generators to Learn Multiple Subspaces to Avoid the Mode Collapse
by Miguel S. Soriano-Garcia, Ricardo Sevilla-Escoboza and Angel Garcia-Pedrero
Mach. Learn. Knowl. Extr. 2023, 5(4), 1456-1473; https://doi.org/10.3390/make5040073 - 10 Oct 2023
Viewed by 1367
Abstract
Generative Adversarial Networks are powerful generative models that are used in different areas and with multiple applications. However, this type of model has a training problem called mode collapse. This problem causes the generator to not learn the complete distribution of the data [...] Read more.
Generative Adversarial Networks are powerful generative models that are used in different areas and with multiple applications. However, this type of model has a training problem called mode collapse. This problem causes the generator to not learn the complete distribution of the data with which it is trained. To force the network to learn the entire data distribution, MSSGAN is introduced. This model has multiple generators and distributes the training data in multiple subspaces, where each generator is enforced to learn only one of the groups with the help of a classifier. We demonstrate that our model performs better on the FID and Sample Distribution metrics compared to previous models to avoid mode collapse. Experimental results show how each of the generators learns different information and, in turn, generates satisfactory quality samples. Full article
(This article belongs to the Special Issue Deep Learning in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

20 pages, 4233 KiB  
Article
Optimal Topology of Vision Transformer for Real-Time Video Action Recognition in an End-To-End Cloud Solution
by Saman Sarraf and Milton Kabia
Mach. Learn. Knowl. Extr. 2023, 5(4), 1320-1339; https://doi.org/10.3390/make5040067 - 29 Sep 2023
Viewed by 1423
Abstract
This study introduces an optimal topology of vision transformers for real-time video action recognition in a cloud-based solution. Although model performance is a key criterion for real-time video analysis use cases, inference latency plays a more crucial role in adopting such technology in [...] Read more.
This study introduces an optimal topology of vision transformers for real-time video action recognition in a cloud-based solution. Although model performance is a key criterion for real-time video analysis use cases, inference latency plays a more crucial role in adopting such technology in real-world scenarios. Our objective is to reduce the inference latency of the solution while admissibly maintaining the vision transformer’s performance. Thus, we employed the optimal cloud components as the foundation of our machine learning pipeline and optimized the topology of vision transformers. We utilized UCF101, including more than one million action recognition video clips. The modeling pipeline consists of a preprocessing module to extract frames from video clips, training two-dimensional (2D) vision transformer models, and deep learning baselines. The pipeline also includes a postprocessing step to aggregate the frame-level predictions to generate the video-level predictions at inference. The results demonstrate that our optimal vision transformer model with an input dimension of 56 × 56 × 3 with eight attention heads produces an F1 score of 91.497% for the testing set. The optimized vision transformer reduces the inference latency by 40.70%, measured through a batch-processing approach, with a 55.63% faster training time than the baseline. Lastly, we developed an enhanced skip-frame approach to improve the inference latency by finding an optimal ratio of frames for prediction at inference, where we could further reduce the inference latency by 57.15%. This study reveals that the vision transformer model is highly optimizable for inference latency while maintaining the model performance. Full article
(This article belongs to the Special Issue Deep Learning in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

21 pages, 9088 KiB  
Article
Research on Forest Fire Detection Algorithm Based on Improved YOLOv5
by Jianfeng Li and Xiaoqin Lian
Mach. Learn. Knowl. Extr. 2023, 5(3), 725-745; https://doi.org/10.3390/make5030039 - 28 Jun 2023
Viewed by 1344
Abstract
Forest fires are one of the world’s deadliest natural disasters. Early detection of forest fires can help minimize the damage to ecosystems and forest life. In this paper, we propose an improved fire detection method YOLOv5-IFFDM for YOLOv5. Firstly, the fire and smoke [...] Read more.
Forest fires are one of the world’s deadliest natural disasters. Early detection of forest fires can help minimize the damage to ecosystems and forest life. In this paper, we propose an improved fire detection method YOLOv5-IFFDM for YOLOv5. Firstly, the fire and smoke detection accuracy and the network perception accuracy of small targets are improved by adding an attention mechanism to the backbone network. Secondly, the loss function is improved and the SoftPool pyramid pooling structure is used to improve the regression accuracy and detection performance of the model and the robustness of the model. In addition, a random mosaic augmentation technique is used to enhance the data to increase the generalization ability of the model, and re-clustering of flame and smoke detection a priori frames are used to improve the accuracy and speed. Finally, the parameters of the convolutional and normalization layers of the trained model are homogeneously merged to further reduce the model processing load and to improve the detection speed. Experimental results on self-built forest-fire and smoke datasets show that this algorithm has high detection accuracy and fast detection speed, with average accuracy of fire up to 90.5% and smoke up to 84.3%, and detection speed up to 75 FPS (frames per second transmission), which can meet the requirements of real-time and efficient fire detection. Full article
(This article belongs to the Special Issue Deep Learning in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

16 pages, 1978 KiB  
Article
Human Action Recognition-Based IoT Services for Emergency Response Management
by Talal H. Noor
Mach. Learn. Knowl. Extr. 2023, 5(1), 330-345; https://doi.org/10.3390/make5010020 - 13 Mar 2023
Cited by 1 | Viewed by 2061
Abstract
Emergency incidents can appear anytime and any place, which makes it very challenging for emergency medical services practitioners to predict the location and the time of such emergencies. The dynamic nature of the appearance of emergency incidents can cause delays in emergency medical [...] Read more.
Emergency incidents can appear anytime and any place, which makes it very challenging for emergency medical services practitioners to predict the location and the time of such emergencies. The dynamic nature of the appearance of emergency incidents can cause delays in emergency medical services, which can sometimes lead to vital injury complications or even death, in some cases. The delay of emergency medical services may occur as a result of a call that was made too late or because no one was present to make the call. With the emergence of smart cities and promising technologies, such as the Internet of Things (IoT) and computer vision techniques, such issues can be tackled. This article proposes a human action recognition-based IoT services architecture for emergency response management. In particular, the architecture exploits IoT devices (e.g., surveillance cameras) that are distributed in public areas to detect emergency incidents, make a request for the nearest emergency medical services, and send emergency location information. Moreover, this article proposes an emergency incidents detection model, based on human action recognition and object tracking, using image processing and classifying the collected images, based on action modeling. The primary notion of the proposed model is to classify human activity, whether it is an emergency incident or other daily activities, using a Convolutional Neural Network (CNN) and Support Vector Machine (SVM). To demonstrate the feasibility of the proposed emergency detection model, several experiments were conducted using the UR fall detection dataset, which consists of emergency and other daily activities footage. The results of the conducted experiments were promising, with the proposed model scoring 0.99, 0.97, 0.97, and 0.98 in terms of sensitivity, specificity, precision, and accuracy, respectively. Full article
(This article belongs to the Special Issue Deep Learning in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Review

Jump to: Research

37 pages, 53762 KiB  
Review
A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS
by Juan Terven, Diana-Margarita Córdova-Esparza and Julio-Alejandro Romero-González
Mach. Learn. Knowl. Extr. 2023, 5(4), 1680-1716; https://doi.org/10.3390/make5040083 - 20 Nov 2023
Cited by 77 | Viewed by 14085
Abstract
YOLO has become a central real-time object detection system for robotics, driverless cars, and video monitoring applications. We present a comprehensive analysis of YOLO’s evolution, examining the innovations and contributions in each iteration from the original YOLO up to YOLOv8, YOLO-NAS, and YOLO [...] Read more.
YOLO has become a central real-time object detection system for robotics, driverless cars, and video monitoring applications. We present a comprehensive analysis of YOLO’s evolution, examining the innovations and contributions in each iteration from the original YOLO up to YOLOv8, YOLO-NAS, and YOLO with transformers. We start by describing the standard metrics and postprocessing; then, we discuss the major changes in network architecture and training tricks for each model. Finally, we summarize the essential lessons from YOLO’s development and provide a perspective on its future, highlighting potential research directions to enhance real-time object detection systems. Full article
(This article belongs to the Special Issue Deep Learning in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Back to TopTop