Deep Learning-Based Target/Object Detection

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 31 May 2024 | Viewed by 4561

Special Issue Editor


E-Mail Website
Guest Editor
School of Automation, Central South University, Changsha 410083, China
Interests: infrared image detection and recognition; autonomous driving scene perception; machine learning and deep learning algorithms
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Target/object detection has attracted increasing attention in recent years due to its wide range of applications and recent technological breakthroughs. This task is under extensive investigation in both academia and real-world applications, such as monitoring security, autonomous driving, transportation surveillance, drone scene analysis, and robotic vision. At present, a deep learning model has been widely adopted in the whole field of computer vision, including general object detection and domain-specific object detection. Most of the state-of-the-art object detectors utilize deep learning networks as their backbone and detection networks to extract features from input images or videos for classification and localization, respectively. Target/object detection is a computer technology related to computer vision and image processing which deals with detecting instances of semantic objects of a certain class (such as humans, buildings, or cars) in digital images and videos. Domains of object detection have been comprehensively researched.

This Special Issue aims to encourage leading scientists to contribute their latest advances and prospects in multi-category detection, edge detection, salient object detection, pose detection, scene text detection, face detection, pedestrian detection, etc., as well as applications in real-world scenes, but with no limitations to novel solutions that could help to improve the efficiency and effectiveness of deep learning frameworks.

Dr. Fan Zhang
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • artificial intelligence
  • deep learning
  • machine learning
  • computer vision
  • object detection
  • video analysis
  • image classification
  • image processing
  • scene perception
  • neural network

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

11 pages, 1461 KiB  
Article
Infrared Image Enhancement Using Convolutional Neural Networks for Auto-Driving
by Shunshun Zhong, Luowei Fu and Fan Zhang
Appl. Sci. 2023, 13(23), 12581; https://doi.org/10.3390/app132312581 - 22 Nov 2023
Viewed by 746
Abstract
Auto-driving detection usually acquires low-light infrared images, which pose a great challenge to the autopilot function at night due to their low contrast and unclear texture details. As a precursor algorithm in the field of automatic driving, the infrared image contrast enhancement method [...] Read more.
Auto-driving detection usually acquires low-light infrared images, which pose a great challenge to the autopilot function at night due to their low contrast and unclear texture details. As a precursor algorithm in the field of automatic driving, the infrared image contrast enhancement method is of great significance in accelerating the operation speed of automatic driving target recognition algorithms and improving the accuracy of object localization. In this study, a convolutional neural network model including feature extraction and image enhancement modules is proposed to enhance infrared images. Specifically, the feature extraction module consists of three branches, a concatenation layer, and a fusion layer that connect in parallel to extract the feature images. The image enhancement module contains eight convolutional layers, one connectivity layer, and one difference layer for enhancing contrast in infrared images. In order to overcome the problem of the lack of a large amount of training data and to improve the accuracy of the model, the brightness and sharpness of the infrared images are randomly transformed to expand the number of pictures in the training set and form more sample pairs. Unlike traditional enhancement methods, the proposed model directly learns the end-to-end mapping between low- and high-contrast images. Extensive experiments from qualitative and quantitative perspectives demonstrate that our method can achieve better clarity in a shorter time. Full article
(This article belongs to the Special Issue Deep Learning-Based Target/Object Detection)
Show Figures

Figure 1

19 pages, 3677 KiB  
Article
Optimizing Multimodal Scene Recognition through Mutual Information-Based Feature Selection in Deep Learning Models
by Mohamed Hammad, Samia Allaoua Chelloug, Walaa Alayed and Ahmed A. Abd El-Latif
Appl. Sci. 2023, 13(21), 11829; https://doi.org/10.3390/app132111829 - 29 Oct 2023
Cited by 1 | Viewed by 1159
Abstract
The field of scene recognition, which lies at the crossroads of computer vision and artificial intelligence, has experienced notable progress because of scholarly pursuits. This article introduces a novel methodology for scene recognition by combining convolutional neural networks (CNNs) with feature selection techniques [...] Read more.
The field of scene recognition, which lies at the crossroads of computer vision and artificial intelligence, has experienced notable progress because of scholarly pursuits. This article introduces a novel methodology for scene recognition by combining convolutional neural networks (CNNs) with feature selection techniques based on mutual information (MI). The main goal of our study is to address the limitations inherent in conventional unimodal methods, with the aim of improving the precision and dependability of scene classification. The focus of our research is around the formulation of a comprehensive approach for scene detection, utilizing multimodal deep learning methodologies implemented on a solitary input image. Our work distinguishes itself by the innovative amalgamation of CNN- and MI-based feature selection. This integration provides distinct advantages and enhanced capabilities when compared to prevailing methodologies. In order to assess the effectiveness of our methodology, we performed tests on two openly accessible datasets, namely, the scene categorization dataset and the AID dataset. The results of these studies exhibited notable levels of precision, with accuracies of 100% and 98.83% achieved for the corresponding datasets. These findings surpass the performance of other established techniques. The primary objective of our end-to-end approach is to reduce complexity and resource requirements, hence creating a robust framework for the task of scene categorization. This work significantly advances the practical application of computer vision in various real-world scenarios, leading to a large improvement in the accuracy of scene recognition and interpretation. Full article
(This article belongs to the Special Issue Deep Learning-Based Target/Object Detection)
Show Figures

Figure 1

20 pages, 33820 KiB  
Article
GolfMate: Enhanced Golf Swing Analysis Tool through Pose Refinement Network and Explainable Golf Swing Embedding for Self-Training
by Chan-Yang Ju, Jong-Hyeon Kim and Dong-Ho Lee
Appl. Sci. 2023, 13(20), 11227; https://doi.org/10.3390/app132011227 - 12 Oct 2023
Viewed by 1345
Abstract
Digital fitness has become a widely used tool for remote exercise guidance, leveraging artificial intelligence to analyze exercise videos and support self-training. This paper introduces a method for self-training in golf, a sport where automated posture analysis can significantly reduce the costs associated [...] Read more.
Digital fitness has become a widely used tool for remote exercise guidance, leveraging artificial intelligence to analyze exercise videos and support self-training. This paper introduces a method for self-training in golf, a sport where automated posture analysis can significantly reduce the costs associated with professional coaching. Our system utilizes a pose refinement methodology and an explainable golf swing embedding for analyzing the swing motions of learners and professional golfers. By leveraging sequential coordinate information, we detect biased pose joints and refine the 2D and 3D human pose estimation results. Furthermore, we propose a swing embedding method that considers geometric information extracted from the swing pose. This approach enables not only the comparison of the similarity between two golf swing poses but also the visualization of different points, providing learners with specific and intuitive feedback on areas that require correction. Our experimental results demonstrate the effectiveness of our swing guide system in identifying specific body points that need adjustment to align more closely with a professional golfer’s swing. This research contributes to the digital fitness domain by enhancing the accuracy of posture analysis and providing a specialized and interpretable golf swing analysis system. Our proposed system offers a low-cost and time-efficient approach for users who wish to improve their golf swing, paving the way for broader applications of digital fitness technologies in self-training contexts. Full article
(This article belongs to the Special Issue Deep Learning-Based Target/Object Detection)
Show Figures

Figure 1

14 pages, 2506 KiB  
Article
A Stave-Aware Optical Music Recognition on Monophonic Scores for Camera-Based Scenarios
by Yipeng Liu, Ruimin Wu, Yifan Wu, Lijie Luo and Wei Xu
Appl. Sci. 2023, 13(16), 9360; https://doi.org/10.3390/app13169360 - 17 Aug 2023
Viewed by 815
Abstract
The recognition of printed music sheets in camera-based realistic scenarios is a novel research branch of optical music recognition (OMR). However, special factors in realistic scenarios, such as uneven lighting distribution and curvature of staff lines, can have adverse effects on OMR models [...] Read more.
The recognition of printed music sheets in camera-based realistic scenarios is a novel research branch of optical music recognition (OMR). However, special factors in realistic scenarios, such as uneven lighting distribution and curvature of staff lines, can have adverse effects on OMR models designed for digital music scores. This paper proposes a stave-aware method based on object detection to recognize monophonic printed sheet music in camera-based scenarios. By detecting the positions of staff lines, we improve the accuracy of note pitch effectively. In addition, we present the Camera Printed Music Staves (CPMS) dataset, which consists of labels and images captured by mobile phones under different angles and lighting conditions in realistic scenarios. We compare our method after training on different datasets with a sequence recognition method called CRNN-CTC on the test set of the CPMS dataset. The results show that the accuracy, robustness, and data dependency of our method perform better. Full article
(This article belongs to the Special Issue Deep Learning-Based Target/Object Detection)
Show Figures

Figure 1

Back to TopTop