Editorial

4 pages, 214 KiB

Open AccessEditorial

Editorial on the Special Issue: New Trends in Image Processing III

by Hyeonjoon Moon and Irfan Mehmood

Appl. Sci. 2023, 13(22), 12430; https://doi.org/10.3390/app132212430 - 17 Nov 2023

Viewed by 1295

The image processing field is undergoing a significant transformation owing to rapid advancements in deep learning, computer vision, and artificial intelligence [...] Full article

(This article belongs to the Special Issue New Trends in Image Processing III)

Research

Jump to: Editorial

20 pages, 12604 KiB

Open AccessArticle

SAFP-YOLO: Enhanced Object Detection Speed Using Spatial Attention-Based Filter Pruning

by Hanse Ahn, Seungwook Son, Jaehyeon Roh, Hwapyeong Baek, Sungju Lee, Yongwha Chung and Daihee Park

Appl. Sci. 2023, 13(20), 11237; https://doi.org/10.3390/app132011237 - 12 Oct 2023

Cited by 1 | Viewed by 1143

Abstract

Because object detection accuracy has significantly improved advancements in deep learning techniques, many real-time applications have applied one-stage detectors, such as You Only Look Once (YOLO), owing to their fast execution speed and accuracy. However, for a practical deployment, the deployment cost should [...] Read more.

Because object detection accuracy has significantly improved advancements in deep learning techniques, many real-time applications have applied one-stage detectors, such as You Only Look Once (YOLO), owing to their fast execution speed and accuracy. However, for a practical deployment, the deployment cost should be considered. In this paper, a method for pruning the unimportant filters of YOLO is proposed to satisfy the real-time requirements of a low-cost embedded board. Attention mechanisms have been widely used to improve the accuracy of deep learning models. However, the proposed method uses spatial attention to improve the execution speed of YOLO by evaluating the importance of each YOLO filter. The feature maps before and after spatial attention are compared, and then the unimportant filters of YOLO can be pruned based on this comparison. To the best of our knowledge, this is the first report considering both accuracy and speed with Spatial Attention-based Filter Pruning (SAFP) for lightweight object detectors. To demonstrate the effectiveness of the proposed method, it was applied to the YOLOv4 and YOLOv7 baseline models. With the pig (baseline YOLOv4 84.4%@3.9FPS vs. proposed SAFP-YOLO 78.6%@20.9FPS) and vehicle (baseline YOLOv7 81.8%@3.8FPS vs. proposed SAFP-YOLO 75.7%@20.0FPS) datasets, the proposed method significantly improved the execution speed of YOLOv4 and YOLOv7 (i.e., by a factor of five) on a low-cost embedded board, TX-2, with acceptable accuracy. Full article

(This article belongs to the Special Issue New Trends in Image Processing III)

► Show Figures

Figure 1

22 pages, 7819 KiB

Open AccessArticle

Who Cares about the Weather? Inferring Weather Conditions for Weather-Aware Object Detection in Thermal Images

by Anders Skaarup Johansen, Kamal Nasrollahi, Sergio Escalera and Thomas B. Moeslund

Appl. Sci. 2023, 13(18), 10295; https://doi.org/10.3390/app131810295 - 14 Sep 2023

Viewed by 1130

Abstract

Deployments of real-world object detection systems often experience a degradation in performance over time due to concept drift. Systems that leverage thermal cameras are especially susceptible because the respective thermal signatures of objects and their surroundings are highly sensitive to environmental changes. In [...] Read more.

Deployments of real-world object detection systems often experience a degradation in performance over time due to concept drift. Systems that leverage thermal cameras are especially susceptible because the respective thermal signatures of objects and their surroundings are highly sensitive to environmental changes. In this study, two types of weather-aware latent conditioning methods are investigated. The proposed method aims to guide two object detectors, (YOLOv5 and Deformable DETR) to become weather-aware. This is achieved by leveraging an auxiliary branch that predicts weather-related information while conditioning intermediate layers of the object detector. While the conditioning methods proposed do not directly improve the accuracy of baseline detectors, it can be observed that conditioned networks manage to extract a weather-related signal from the thermal images, thus resulting in a decreased miss rate at the cost of increased false positives. The extracted signal appears noisy and is thus challenging to regress accurately. This is most likely a result of the qualitative nature of the thermal sensor; thus, further work is needed to identify an ideal method for optimizing the conditioning branch, as well as to further improve the accuracy of the system. Full article

(This article belongs to the Special Issue New Trends in Image Processing III)

► Show Figures

Figure 1

14 pages, 4143 KiB

Open AccessArticle

Fast and Accurate Facial Expression Image Classification and Regression Method Based on Knowledge Distillation

by Kunyoung Lee, Seunghyun Kim and Eui Chul Lee

Appl. Sci. 2023, 13(11), 6409; https://doi.org/10.3390/app13116409 - 24 May 2023

Cited by 5 | Viewed by 1453

Abstract

As emotional states are diverse, simply classifying them through discrete facial expressions has its limitations. Therefore, to create a facial expression recognition system for practical applications, not only must facial expressions be classified, emotional changes must be measured as continuous values. Based on [...] Read more.

As emotional states are diverse, simply classifying them through discrete facial expressions has its limitations. Therefore, to create a facial expression recognition system for practical applications, not only must facial expressions be classified, emotional changes must be measured as continuous values. Based on the knowledge distillation structure and the teacher-bounded loss function, we propose a method to maximize the synergistic effect of jointly learning discrete and continuous emotional states of eight expression classes, valences, and arousal levels. The proposed knowledge distillation model uses Emonet, a state-of-the-art continuous estimation method, as the teacher model, and uses a lightweight network as the student model. It was confirmed that performance degradation can be minimized even though student models have multiply-accumulate operations of approximately 3.9 G and 0.3 G when using EfficientFormer and MobileNetV2, respectively, which is much less than the amount of computation required by the teacher model (16.99 G). Together with the significant improvements in computational efficiency (by 4.35 and 56.63 times using EfficientFormer and MobileNetV2, respectively), the decreases in facial expression classification accuracy were approximately 1.35% and 1.64%, respectively. Therefore, the proposed method is optimized for application-level interaction systems in terms of both the amount of computation required and the accuracy. Full article

(This article belongs to the Special Issue New Trends in Image Processing III)

► Show Figures

Figure 1

31 pages, 13718 KiB

Open AccessArticle

A Comparative Analysis of Feature Detectors and Descriptors for Image Stitching

by Surendra Kumar Sharma, Kamal Jain and Anoop Kumar Shukla

Appl. Sci. 2023, 13(10), 6015; https://doi.org/10.3390/app13106015 - 13 May 2023

Cited by 13 | Viewed by 2584

Abstract

Image stitching is a technique that is often employed in image processing and computer vision applications. The feature points in an image provide a significant amount of key information. Image stitching requires accurate extraction of these features since it may decrease misalignment flaws [...] Read more.

Image stitching is a technique that is often employed in image processing and computer vision applications. The feature points in an image provide a significant amount of key information. Image stitching requires accurate extraction of these features since it may decrease misalignment flaws in the final stitched image. In recent years, a variety of feature detectors and descriptors that may be utilized for image stitching have been presented. However, the computational cost and correctness of feature matching restrict the utilization of these techniques. To date, no work compared feature detectors and descriptors for image stitching applications, i.e., no one has considered the effect of detectors and descriptors on the generated final stitched image. This paper presents a detailed comparative analysis of commonly used feature detectors and descriptors proposed previously. This study gives various contributions to the development of a general comparison of feature detectors and descriptors for image stitching applications. These detectors and descriptors are compared in terms of number of matched points, time taken and quality of stitched image. After analyzing the obtained results, it was observed that the combination of AKAZE with AKAZE can be preferable almost in all possible situations. Full article

(This article belongs to the Special Issue New Trends in Image Processing III)

► Show Figures

Figure 1

17 pages, 6211 KiB

Open AccessArticle

Convolved Feature Vector Based Adaptive Fuzzy Filter for Image De-Noising

by Muhammad Habib, Ayyaz Hussain, Eid Rehman, Syeda Mariam Muzammal, Benmao Cheng, Muhammad Aslam and Syeda Fizzah Jilani

Appl. Sci. 2023, 13(8), 4861; https://doi.org/10.3390/app13084861 - 12 Apr 2023

Cited by 1 | Viewed by 1064

Abstract

In this paper, a convolved feature vector based adaptive fuzzy filter is proposed for impulse noise removal. The proposed filter follows traditional approach, i.e., detection of noisy pixels based on certain criteria followed by filtering process. In the first step, proposed noise detection [...] Read more.

In this paper, a convolved feature vector based adaptive fuzzy filter is proposed for impulse noise removal. The proposed filter follows traditional approach, i.e., detection of noisy pixels based on certain criteria followed by filtering process. In the first step, proposed noise detection mechanism initially selects a small layer of input image pixels, convolves it with a set of weighted kernels to form a convolved feature vector layer. This layer of features is then passed to fuzzy inference system, where fuzzy membership degrees and reduced set of fuzzy rules play an important part to classify the pixel as noise-free, edge or noisy. Noise-free pixels in the filtering phase remain unaffected causing maximum detail preservation whereas noisy pixels are restored using fuzzy filter. This process is carried out traditionally starting from top left corner of the noisy image to the bottom right corner with a stride rate of one for small input layer and a stride rate of two during convolution. Convolved feature vector is very helpful in finding the edge information and hidden patterns in the input image that are affected by noise. The performance of the proposed study is tested on large data set using standard performance measures and the proposed technique outperforms many existing state of the art techniques with excellent detail preservation and effective noise removal capabilities. Full article

(This article belongs to the Special Issue New Trends in Image Processing III)

► Show Figures

Figure 1

17 pages, 2725 KiB

Open AccessArticle

CNN Attention Enhanced ViT Network for Occluded Person Re-Identification

by Jing Wang, Peitong Li, Rongfeng Zhao, Ruyan Zhou and Yanling Han

Appl. Sci. 2023, 13(6), 3707; https://doi.org/10.3390/app13063707 - 14 Mar 2023

Cited by 4 | Viewed by 1731

Abstract

Person re-identification (ReID) is often affected by occlusion, which makes most of the features extracted by ReID models contain a lot of identity-independent noise. Recently, the use of Vision Transformer (ViT) has enabled significant progress in various visual artificial intelligence tasks. However, ViT [...] Read more.

Person re-identification (ReID) is often affected by occlusion, which makes most of the features extracted by ReID models contain a lot of identity-independent noise. Recently, the use of Vision Transformer (ViT) has enabled significant progress in various visual artificial intelligence tasks. However, ViT suffers from insufficient local information extraction capability, which should be of concern to researchers in the field of occluded ReID. This paper conducts a study to exploit the potential of attention mechanisms to enhance ViT in ReID tasks. In this study, an Attention Enhanced ViT Network (AET-Net) is proposed for occluded ReID. We use ViT as the backbone network to extract image features. Even so, occlusion and outlier problems still exist in ReID. Then, we combine the spatial attention mechanism into the ViT architecture, by which we enhance the attention of ViT patch embedding vectors to important regions. In addition, we design a MultiFeature Training Module to optimize the network by the construction of multiple classification features and calculation of the multi-feature loss to enhance the performance of the model. Finally, the effectiveness and superiority of the proposed method are demonstrated by broad experiments on both occluded and non-occluded datasets. Full article

(This article belongs to the Special Issue New Trends in Image Processing III)

► Show Figures

Figure 1

26 pages, 5322 KiB

Open AccessArticle

A Real-Time Remote Respiration Measurement Method with Improved Robustness Based on a CNN Model

by Hyeonsang Hwang, Kunyoung Lee and Eui Chul Lee

Appl. Sci. 2022, 12(22), 11603; https://doi.org/10.3390/app122211603 - 15 Nov 2022

Cited by 3 | Viewed by 1878

Abstract

Human respiration reflects meaningful information, such as one’s health and psychological state. Rates of respiration are an important indicator in medicine because they are directly related to life, death, and the onset of a serious disease. In this study, we propose a noncontact [...] Read more.

Human respiration reflects meaningful information, such as one’s health and psychological state. Rates of respiration are an important indicator in medicine because they are directly related to life, death, and the onset of a serious disease. In this study, we propose a noncontact method to measure respiration. Our proposed approach uses a standard RGB camera and does not require any special equipment. Measurement is performed automatically by detecting body landmarks to identify regions of interest (RoIs). We adopt a learning model trained to measure motion and respiration by analyzing movement from RoI images for high robustness to background noise. We collected a remote respiration measurement dataset to train the proposed method and compared its measurement performance with that of representative existing methods. Experimentally, the proposed method showed a performance similar to that of existing methods in a stable environment with restricted motion. However, its performance was significantly improved compared to existing methods owing to its robustness to motion noise. In an environment with partial occlusion and small body movement, the error of the existing methods was 4–8 bpm, whereas the error of our proposed method was around 0.1 bpm. In addition, by measuring the time required to perform each step of the respiration measurement process, we confirmed that the proposed method can be implemented in real time at over 30 FPS using only a standard CPU. Since the proposed approach shows state-of-the-art accuracy with the error of 0.1 bpm in the wild, it can be expanded to various applications, such as medicine, home healthcare, emotional marketing, forensic investigation, and fitness in future research. Full article

(This article belongs to the Special Issue New Trends in Image Processing III)

► Show Figures

Figure 1

Journal Menu

Journal Browser

New Trends in Image Processing III

Share This Special Issue

Special Issue Editors

Special Issue Information

Published Papers (8 papers)

Editorial

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI