Computer Vision and Pattern Recognition with Applications, 2nd Edition

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Mathematics and Computer Science".

Deadline for manuscript submissions: 30 September 2024 | Viewed by 2012

Special Issue Editor


E-Mail Website
Guest Editor
School of Electrical Engineering and Automation, Anhui University, Hefei 230601, China
Interests: computer vision; pattern recognition; multimedia computing
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues, 

Computer vision and pattern recognition are fundamental problems in artificial intelligence, which also belongs to the application scopes of mathematical theory and tools. Computer vision enables computers and systems to derive meaningful information from digital images, videos, and other visual inputs, and take action or make recommendations based on that information. Pattern recognition is the process of recognizing patterns using a machine learning algorithm. In recent years, there has been a rapid expansion of computer vision and pattern recognition, and a wide range of applications based on computer vision and pattern recognition can be seen everywhere, e.g., object detection, recognition, segmentation, classification, content generation, and multimedia analysis. In this Special Issue, we aim to assemble recent advances in computer vision, pattern recognition, and related extended applications.

Prof. Dr. Teng Li
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • pattern classification and clustering
  • machine learning, neural network, and deep learning
  • theory in computer vision and pattern recognition
  • low-level vision, image processing, and machine vision
  • 3D computer vision and reconstruction
  • object detection, tracking, recognition, and action recognition
  • data mining and signal processing
  • multimedia/multimodal analysis and applications
  • biomedical image processing and analysis
  • medical image analysis and applications
  • graph theory and its applications
  • vision analysis and understanding
  • vision applications and systems
  • vision for robots and autonomous driving
  • vision and language

Related Special Issue

Published Papers (2 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 5606 KiB  
Article
LASFormer: Light Transformer for Action Segmentation with Receptive Field-Guided Distillation and Action Relation Encoding
by Zhichao Ma and Kan Li
Mathematics 2024, 12(1), 57; https://doi.org/10.3390/math12010057 - 24 Dec 2023
Viewed by 647
Abstract
Transformer-based models for action segmentation have achieved high frame-wise accuracy against challenging benchmarks. However, they rely on multiple decoders and self-attention blocks for informative representations, whose huge computing and memory costs remain an obstacle to handling long video sequences and practical deployment. To [...] Read more.
Transformer-based models for action segmentation have achieved high frame-wise accuracy against challenging benchmarks. However, they rely on multiple decoders and self-attention blocks for informative representations, whose huge computing and memory costs remain an obstacle to handling long video sequences and practical deployment. To address these issues, we design a light transformer model for the action segmentation task, named LASFormer, with a novel encoder–decoder structure based on three key designs. First, we propose a receptive field-guided distillation to realize mode reduction, which can overcome more generally the gap in semantic feature structure between the intermediate features by aggregated temporal dilation convolution (ATDC). Second, we propose a simplified implicit attention to replace self-attention to avoid its quadratic complexity. Third, we design an efficient action relation encoding module embedded after the decoder, where the temporal graph reasoning introduces an inductive bias that adjacent frames are more likely to belong to the same class of model global temporal relations, and the cross-model fusion structure integrates frame-level and segment-level temporal clues, which can avoid over-segmentation independent of multiple decoders, thus reducing further computational complexity. Extensive experiments have verified the effectiveness and efficiency of the framework. Against the challenging 50Salads, GTEA, and Breakfast benchmarks, LASFormer significantly outperforms the current state-of-the-art methods in accuracy, edit score, and F1 score. Full article
Show Figures

Figure 1

29 pages, 12414 KiB  
Article
OMOFuse: An Optimized Dual-Attention Mechanism Model for Infrared and Visible Image Fusion
by Jianye Yuan and Song Li
Mathematics 2023, 11(24), 4902; https://doi.org/10.3390/math11244902 - 07 Dec 2023
Viewed by 688
Abstract
Infrared and visible image fusion aims to fuse the thermal information of infrared images and the texture information of visible images into images that are more in compliance with people’s visual perception characteristics. However, in the existing related work, the fused images have [...] Read more.
Infrared and visible image fusion aims to fuse the thermal information of infrared images and the texture information of visible images into images that are more in compliance with people’s visual perception characteristics. However, in the existing related work, the fused images have incomplete contextual information and poor fusion results. This paper presents a new image fusion algorithm—OMOFuse. At first, both the channel and spatial attention mechanisms are optimized by a DCA (dual-channel attention) mechanism and an ESA (enhanced spatial attention) mechanism. Then, an ODAM (optimized dual-attention mechanism) module is constructed to further improve the integration effect. Moreover, a MO module is used to improve the network’s feature extraction capability for contextual information. Finally, there is the loss function ℒ from the three parts of SSL (structural similarity loss), PL (perceptual loss), and GL (gap loss). Extensive experiments on three major datasets are performed to demonstrate that OMOFuse outperforms the existing image fusion methods in terms of quantitative determination, qualitative detection, and superior generalization capabilities. Further evidence of the effectiveness of our algorithm in this study are provided. Full article
Show Figures

Figure 1

Back to TopTop