Special Issue "Deep Learning in Image Analysis and Pattern Recognition"

A special issue of Machine Learning and Knowledge Extraction (ISSN 2504-4990). This special issue belongs to the section "Visualization".

Deadline for manuscript submissions: closed (30 August 2023) | Viewed by 3770

Special Issue Editors

School of Computer Science and Technology, Harbin Institute of Technology, Weihai 264209, China
Interests: machine Learning; data mining; medical informatics; bioinformatics and service computing
School of Computer Science, University of Technology Sydney, Sydney 2007, Australia
Interests: graph mining; multimodal learning; time series analysis; signal processing; recommender systems

Special Issue Information

Dear Colleagues,

In the past decade, deep learning has demonstrated state-of-the-art performances in many image processing tasks, such as image classification, object detection, object tracking, image segmentation and so on. Despite remarkable deep-learning-related achievements in computer vision, many challenging tasks and scenarios calling for novel methods and theories remain. For example, lightweight object detection is required in many applications, such as autonomous driving scenarios; however, the discrepancy between the speed of a machine and human eyes remains large. Fine-grained or small object detection constitutes another area of improvement. Weakly supervised object classification or detection is an important problem since the annotation process is time-consuming, expensive and inefficient. In addition, although deep learning achievements in computer vision have been successfully applied to many areas, greater efforts should be made to ensure this technology can better serve humans in the future.

This Special Issue call for papers detailing new advances in deep learning methods or applications in image analysis and pattern recognition. Topics of interest include, but are not limited to:

  • Image classification;
  • Object detection;
  • Object tracking;
  • Image segmentation;
  • Convolutional neural networks;
  • Diffusion model;
  • Image captioning;
  • Image clustering;
  • Representation learning for images;
  • Medical image processing;
  • Remote sensing image processing;
  • 3D image building.

Dr. Guoqing Chao
Dr. Xianzhi Wang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Machine Learning and Knowledge Extraction is an international peer-reviewed open access quarterly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine learning
  • deep learning
  • image analysis
  • pattern recognition

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Article
Optimal Topology of Vision Transformer for Real-Time Video Action Recognition in an End-To-End Cloud Solution
Mach. Learn. Knowl. Extr. 2023, 5(4), 1320-1339; https://doi.org/10.3390/make5040067 - 29 Sep 2023
Viewed by 195
Abstract
This study introduces an optimal topology of vision transformers for real-time video action recognition in a cloud-based solution. Although model performance is a key criterion for real-time video analysis use cases, inference latency plays a more crucial role in adopting such technology in [...] Read more.
This study introduces an optimal topology of vision transformers for real-time video action recognition in a cloud-based solution. Although model performance is a key criterion for real-time video analysis use cases, inference latency plays a more crucial role in adopting such technology in real-world scenarios. Our objective is to reduce the inference latency of the solution while admissibly maintaining the vision transformer’s performance. Thus, we employed the optimal cloud components as the foundation of our machine learning pipeline and optimized the topology of vision transformers. We utilized UCF101, including more than one million action recognition video clips. The modeling pipeline consists of a preprocessing module to extract frames from video clips, training two-dimensional (2D) vision transformer models, and deep learning baselines. The pipeline also includes a postprocessing step to aggregate the frame-level predictions to generate the video-level predictions at inference. The results demonstrate that our optimal vision transformer model with an input dimension of 56 × 56 × 3 with eight attention heads produces an F1 score of 91.497% for the testing set. The optimized vision transformer reduces the inference latency by 40.70%, measured through a batch-processing approach, with a 55.63% faster training time than the baseline. Lastly, we developed an enhanced skip-frame approach to improve the inference latency by finding an optimal ratio of frames for prediction at inference, where we could further reduce the inference latency by 57.15%. This study reveals that the vision transformer model is highly optimizable for inference latency while maintaining the model performance. Full article
(This article belongs to the Special Issue Deep Learning in Image Analysis and Pattern Recognition)
Article
Research on Forest Fire Detection Algorithm Based on Improved YOLOv5
Mach. Learn. Knowl. Extr. 2023, 5(3), 725-745; https://doi.org/10.3390/make5030039 - 28 Jun 2023
Viewed by 665
Abstract
Forest fires are one of the world’s deadliest natural disasters. Early detection of forest fires can help minimize the damage to ecosystems and forest life. In this paper, we propose an improved fire detection method YOLOv5-IFFDM for YOLOv5. Firstly, the fire and smoke [...] Read more.
Forest fires are one of the world’s deadliest natural disasters. Early detection of forest fires can help minimize the damage to ecosystems and forest life. In this paper, we propose an improved fire detection method YOLOv5-IFFDM for YOLOv5. Firstly, the fire and smoke detection accuracy and the network perception accuracy of small targets are improved by adding an attention mechanism to the backbone network. Secondly, the loss function is improved and the SoftPool pyramid pooling structure is used to improve the regression accuracy and detection performance of the model and the robustness of the model. In addition, a random mosaic augmentation technique is used to enhance the data to increase the generalization ability of the model, and re-clustering of flame and smoke detection a priori frames are used to improve the accuracy and speed. Finally, the parameters of the convolutional and normalization layers of the trained model are homogeneously merged to further reduce the model processing load and to improve the detection speed. Experimental results on self-built forest-fire and smoke datasets show that this algorithm has high detection accuracy and fast detection speed, with average accuracy of fire up to 90.5% and smoke up to 84.3%, and detection speed up to 75 FPS (frames per second transmission), which can meet the requirements of real-time and efficient fire detection. Full article
(This article belongs to the Special Issue Deep Learning in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Article
Human Action Recognition-Based IoT Services for Emergency Response Management
Mach. Learn. Knowl. Extr. 2023, 5(1), 330-345; https://doi.org/10.3390/make5010020 - 13 Mar 2023
Viewed by 1508
Abstract
Emergency incidents can appear anytime and any place, which makes it very challenging for emergency medical services practitioners to predict the location and the time of such emergencies. The dynamic nature of the appearance of emergency incidents can cause delays in emergency medical [...] Read more.
Emergency incidents can appear anytime and any place, which makes it very challenging for emergency medical services practitioners to predict the location and the time of such emergencies. The dynamic nature of the appearance of emergency incidents can cause delays in emergency medical services, which can sometimes lead to vital injury complications or even death, in some cases. The delay of emergency medical services may occur as a result of a call that was made too late or because no one was present to make the call. With the emergence of smart cities and promising technologies, such as the Internet of Things (IoT) and computer vision techniques, such issues can be tackled. This article proposes a human action recognition-based IoT services architecture for emergency response management. In particular, the architecture exploits IoT devices (e.g., surveillance cameras) that are distributed in public areas to detect emergency incidents, make a request for the nearest emergency medical services, and send emergency location information. Moreover, this article proposes an emergency incidents detection model, based on human action recognition and object tracking, using image processing and classifying the collected images, based on action modeling. The primary notion of the proposed model is to classify human activity, whether it is an emergency incident or other daily activities, using a Convolutional Neural Network (CNN) and Support Vector Machine (SVM). To demonstrate the feasibility of the proposed emergency detection model, several experiments were conducted using the UR fall detection dataset, which consists of emergency and other daily activities footage. The results of the conducted experiments were promising, with the proposed model scoring 0.99, 0.97, 0.97, and 0.98 in terms of sensitivity, specificity, precision, and accuracy, respectively. Full article
(This article belongs to the Special Issue Deep Learning in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Back to TopTop