sensors-logo

Journal Browser

Journal Browser

Transformers in Computer Vision

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Sensing and Imaging".

Deadline for manuscript submissions: closed (30 September 2023) | Viewed by 3835

Special Issue Editors


E-Mail Website
Guest Editor
College of Artificial Intelligence, National Yang Ming Chiao Tung University, Tainan 71150, Taiwan
Interests: 5G/B5G/6G; space-air-ground integrated network (SAGIN); AI-enabled 6G networks; deep-learning-based HIDS & NIDS; smart transportation

E-Mail Website
Guest Editor
Department of Computer Science and Information Engineering, National Chung Cheng University, Chiayi 621301, Taiwan
Interests: computer vision; machine learning; pattern recognition research

Special Issue Information

Dear Colleagues,

Currently, state-of-the-art research in computer vision is heavily focused on deep learning approaches, particularly significant breakthroughs in various tasks based on transformer architecture. Object detection and recognition is a key area of research, with significant progress being made in developing models that can accurately identify and locate objects in images and videos. Additionally, image and video analysis is a popular area of research, with many recent developments in action recognition, video segmentation, and activity detection. Another important area of research is 3D reconstruction and scene understanding, which involves creating 3D models of objects and scenes from 2D images. Deep learning techniques are also being used to improve the performance of medical image analysis, such as tumor detection and segmentation in medical scans. Finally, computer vision for autonomous systems, such as self-driving cars and drones, is also an active area of research, with ongoing work on object detection, scene understanding, and motion planning.

Topics include but are not limited to:

  1. Transformer architectures for object detection and recognition
  2. Transformer-based face applications, such as face recognition systems and multiple-object tracking systems.
  3. Transformer-based approaches for image segmentation
  4. Transformer-based models for video analysis and action recognition
  5. Transformer-based models for 3D object detection
  6. Transformer-based models for panoptic segmentation
  7. Multi-modal transformer models for image–text understanding
  8. Transformer-based models for image synthesis and style transfer
  9. Transformer-based models for image super-resolution
  10. Transformer-based models for image captioning and text-to-image synthesis
  11. Transformer-based models for video synthesis and action generation

Prof. Dr. Ren-Hung Hwang
Dr. Chen-Kuo Chiang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computer vision
  • deep learning
  • transformer

Published Papers (1 paper)

Order results
Result details
Select all
Export citation of selected articles as:

Research

22 pages, 6404 KiB  
Article
PLG-ViT: Vision Transformer with Parallel Local and Global Self-Attention
by Nikolas Ebert, Didier Stricker and Oliver Wasenmüller
Sensors 2023, 23(7), 3447; https://doi.org/10.3390/s23073447 - 25 Mar 2023
Cited by 3 | Viewed by 3337
Abstract
Recently, transformer architectures have shown superior performance compared to their CNN counterparts in many computer vision tasks. The self-attention mechanism enables transformer networks to connect visual dependencies over short as well as long distances, thus generating a large, sometimes even a global receptive [...] Read more.
Recently, transformer architectures have shown superior performance compared to their CNN counterparts in many computer vision tasks. The self-attention mechanism enables transformer networks to connect visual dependencies over short as well as long distances, thus generating a large, sometimes even a global receptive field. In this paper, we propose our Parallel Local-Global Vision Transformer (PLG-ViT), a general backbone model that fuses local window self-attention with global self-attention. By merging these local and global features, short- and long-range spatial interactions can be effectively and efficiently represented without the need for costly computational operations such as shifted windows. In a comprehensive evaluation, we demonstrate that our PLG-ViT outperforms CNN-based as well as state-of-the-art transformer-based architectures in image classification and in complex downstream tasks such as object detection, instance segmentation, and semantic segmentation. In particular, our PLG-ViT models outperformed similarly sized networks like ConvNeXt and Swin Transformer, achieving Top-1 accuracy values of 83.4%, 84.0%, and 84.5% on ImageNet-1K with 27M, 52M, and 91M parameters, respectively. Full article
(This article belongs to the Special Issue Transformers in Computer Vision)
Show Figures

Figure 1

Back to TopTop