Pattern Recognition and Computer Vision Based on Deep Learning

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (20 August 2023) | Viewed by 4032

Special Issue Editors

College of Electronics and Information Engineering, Sichuan University, Chengdu 610065, China
Interests: image/video restoration; image/video coding; machine learning; image segmentation
Special Issues, Collections and Topics in MDPI journals
School of Control Science and Engineering, Shandong University, Jinan 250061, China
Interests: 3D vision; image/video coding and processing; IndRNN
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Pattern recognition and computer vision are classic, exciting and fast-developing fields. They underpin developments in many fields, such as biometrics, bioinformatics, multimedia data analysis, smart city and, most recently, data science, and they have important theoretical and application value. Therefore, they have been a focus of text/speech/image/video-related processing and been the subject of a large amount of research. In recent years, deep learning technology has seen rapid development and made promising achievements in many fields. Especially in the fields of pattern recognition and computer vision, deep learning-based models have significantly improved algorithm performance. Therefore, it is particularly important to study pattern recognition and computer vision based on deep learning.

The purpose of this Special Issue is to present the latest advances of pattern recognition and computer vision technology based on deep learning and their applications in practice. We invite you to present high-quality research papers or literature reviews related to deep learning-based pattern recognition and computer vision, including, but not limited to, image/video restoration, image/video enhancement, target detection and recognition, classification and segmentation, target tracking, image/video understanding and analysis, image/video quality assessment, etc.

Dr. Chao Ren
Prof. Dr. Shuai Li
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • pattern recognition
  • computer vision
  • deep learning
  • image/video restoration and enhancement
  • detection and recognition
  • classification and segmentation
  • target tracking
  • image/video understanding and analysis
  • image/video quality assessment

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

19 pages, 4288 KiB  
Article
SliceSamp: A Promising Downsampling Alternative for Retaining Information in a Neural Network
by Lianlian He and Ming Wang
Appl. Sci. 2023, 13(21), 11657; https://doi.org/10.3390/app132111657 - 25 Oct 2023
Cited by 1 | Viewed by 1046
Abstract
Downsampling, which aims to improve computational efficiency by reducing the spatial resolution of feature maps, is a critical operation in neural networks. Many downsampling methods have been proposed to address the challenge of retaining feature map information. However, some detailed information is still [...] Read more.
Downsampling, which aims to improve computational efficiency by reducing the spatial resolution of feature maps, is a critical operation in neural networks. Many downsampling methods have been proposed to address the challenge of retaining feature map information. However, some detailed information is still lost, even though these methods can extract features with stronger semantics. In this paper, we propose a novel downsampling method which combines feature slicing and depthwise separable convolution for information-retaining downsampling. It slices the input feature map into multiple non-overlapping sub-feature maps by using indexes with a stride of two in the spatial dimension and applies depthwise separable convolution on each slice to extract feature information. To demonstrate the effectiveness of SliceSamp, we compare it with classical downsampling methods on image classification, object detection, and semantic segmentation tasks using several benchmark datasets, including ImageNet-1K, COCO, VOC, and ADE20K. Extensive experiments demonstrate that SliceSamp outperforms classical downsampling methods with consistent improvements in various computer vision tasks. The proposed SliceSamp shows advanced model performance with lower computational costs and memory requirements. By replacing the downsampling layers in different network architectures (including ResNet (Residual Network), YOLOv5, and Swin Transformer), SliceSamp brings different degrees of performance gains (+0.54~3.64%) compared to these baseline models. Additionally, SliceUpsamp enables high-resolution feature reconstruction and alignment during upsampling. SliceSamp and SliceUpsamp can be plug-and-play-integrated into existing neural network architectures. As a promising downsampling alternative to traditional methods, SliceSamp can also provide a reference for designing lightweight and high-performance model architectures in the future. Full article
(This article belongs to the Special Issue Pattern Recognition and Computer Vision Based on Deep Learning)
Show Figures

Figure 1

16 pages, 9196 KiB  
Article
TIG-DETR: Enhancing Texture Preservation and Information Interaction for Target Detection
by Zhiyong Liu, Kehan Wang, Changming Li, Yixuan Wang and Guoqian Luo
Appl. Sci. 2023, 13(14), 8037; https://doi.org/10.3390/app13148037 - 10 Jul 2023
Viewed by 787
Abstract
FPN (Feature Pyramid Network) and transformer-based target detectors are commonly employed in target detection tasks. However, these approaches suffer from design flaws that restrict their performance. To overcome these limitations, we proposed TIG-DETR (Texturized Instance Guidance DETR), a novel target detection model. TIG-DETR [...] Read more.
FPN (Feature Pyramid Network) and transformer-based target detectors are commonly employed in target detection tasks. However, these approaches suffer from design flaws that restrict their performance. To overcome these limitations, we proposed TIG-DETR (Texturized Instance Guidance DETR), a novel target detection model. TIG-DETR comprises a backbone network, TE-FPN (Texture-Enhanced FPN), and an enhanced DETR detector. TE-FPN addresses the issue of texture information loss in FPN by utilizing a bottom-up architecture, Lightweight Feature-wise Attention, and Feature-wise Attention. These components effectively compensate for texture information loss, mitigate the confounding effect of cross-scale fusion, and enhance the final output features. Additionally, we introduced the Instance Based Advanced Guidance Module in the DETR-based detector to tackle the weak detection of larger objects caused by the limitations of window interactions in Shifted Window-based Self-Attention. By incorporating TE-FPN instead of FPN in Faster RCNN and employing ResNet-50 as the backbone network, we observed an improvement of 1.9 AP in average accuracy. By introducing the Instance-Based Advanced Guidance Module, the average accuracy of the DETR-based target detector has been improved by 0.4 AP. TIG-DETR achieves an impressive average accuracy of 44.1% with ResNet-50 as the backbone network. Full article
(This article belongs to the Special Issue Pattern Recognition and Computer Vision Based on Deep Learning)
Show Figures

Figure 1

19 pages, 1168 KiB  
Article
Real Image Deblurring Based on Implicit Degradation Representations and Reblur Estimation
by Zihe Zhao, Man Qin, Haosong Gou, Zhengyong Wang and Chao Ren
Appl. Sci. 2023, 13(13), 7738; https://doi.org/10.3390/app13137738 - 30 Jun 2023
Viewed by 1050
Abstract
Most existing image deblurring methods are based on the estimation of blur kernels and end-to-end learning of the mapping relationship between blurred and sharp images. However, since different real-world blurred images typically have completely different blurring patterns, the performance of these methods in [...] Read more.
Most existing image deblurring methods are based on the estimation of blur kernels and end-to-end learning of the mapping relationship between blurred and sharp images. However, since different real-world blurred images typically have completely different blurring patterns, the performance of these methods in real image deblurring tasks is limited without explicitly modeling blurring as degradation representations. In this paper, we propose IDR2ENet, which is the Implicit Degradation Representations and Reblur Estimation Network, for real image deblurring. IDR2ENet consists of a degradation estimation process, a reblurring process, and a deblurring process. The degradation estimation process takes the real blurred image as input and outputs the implicit degradation representations estimated on it, which are used as the inputs of both reblurring and deblurring processes to better estimate the features of the blurred image. The experimental results show that whether compared with traditional or deep-learning-based deblurring algorithms, IDR2ENet achieves stable and efficient deblurring results on real blurred images. Full article
(This article belongs to the Special Issue Pattern Recognition and Computer Vision Based on Deep Learning)
Show Figures

Figure 1

15 pages, 3614 KiB  
Article
Improved Detector Based on Yolov5 for Typical Targets on the Sea Surfaces
by Anzhu Sun, Jun Ding, Jiarui Liu, Heng Zhou, Jiale Zhang, Peng Zhang, Junwei Dong and Ze Sun
Appl. Sci. 2023, 13(13), 7695; https://doi.org/10.3390/app13137695 - 29 Jun 2023
Cited by 1 | Viewed by 660
Abstract
Detection of targets on sea surfaces is an important area of application that can bring great benefits to the management and control systems in marine environments. However, there are few open-source datasets accessible for the purpose of object detection on seas and rivers. [...] Read more.
Detection of targets on sea surfaces is an important area of application that can bring great benefits to the management and control systems in marine environments. However, there are few open-source datasets accessible for the purpose of object detection on seas and rivers. In this paper, a study is conducted on the improved detection algorithms based on the YOLOv5 model. The dataset for the tests contains ten categories of typical objects that are commonly seen in the contexts of seas, including ships, devices, and structures. Multiple augmentation methods are employed in the pre-processing of the input data, which are verified to be effective in enhancing the generalization ability of the algorithm. Moreover, a new form of the loss function is proposed that highlights the effects of the high-quality boxes during training. The results demonstrate that the adapted loss function contributes to a boost in the model performance. According to the ablation studies, the synthesized methods raise the inference accuracy by making up for several shortcomings of the baseline model for the detection tasks of single or multiple targets from varying backgrounds. Full article
(This article belongs to the Special Issue Pattern Recognition and Computer Vision Based on Deep Learning)
Show Figures

Figure 1

Back to TopTop