Research

16 pages, 9375 KiB

Open AccessArticle

AiPE: A Novel Transformer-Based Pose Estimation Method

by Kai Lu and Dugki Min

Electronics 2024, 13(5), 967; https://doi.org/10.3390/electronics13050967 - 02 Mar 2024

Viewed by 705

Human pose estimation is an important problem in computer vision because it is the foundation for many advanced semantic tasks and downstream applications. Although some convolutional neural network-based pose estimation methods have achieved good results, these networks are still limited for restricted receptive [...] Read more.

Human pose estimation is an important problem in computer vision because it is the foundation for many advanced semantic tasks and downstream applications. Although some convolutional neural network-based pose estimation methods have achieved good results, these networks are still limited for restricted receptive fields and weak robustness, leading to poor detection performance in scenarios with blur or low resolution. Additionally, their highly parallelized strategy is likely to cause significant computational demands, requiring high computing power. In comparison to the convolutional neural networks, the transformer-based methods offer advantages such as flexible stacking, global perspective, and parallel computation. Based on the great benefits, a novel transformer-based human pose estimation method is developed, which employees multi-head self-attention mechanisms and offset windows to effectively suppress the quick growth of the computational complexity near human keypoints. Experimental results under detailed visual comparison and quantitative analysis demonstrate that the proposed method can efficiently deal with the pose estimation problem in challenging scenarios, such as blurry or occluded scenes. Furthermore, the errors in human skeleton mapping caused by keypoint occlusion or omission can be effectively corrected, so the accuracy of pose estimation results is greatly improved. Full article

(This article belongs to the Special Issue Modern Computer Vision and Image Analysis)

► Show Figures

Figure 1

11 pages, 4976 KiB

Open AccessArticle

Image Division Using Threshold Schemes with Privileges

by Marek R. Ogiela and Lidia Ogiela

Electronics 2024, 13(5), 931; https://doi.org/10.3390/electronics13050931 - 29 Feb 2024

Viewed by 411

Abstract

Threshold schemes are used among cryptographic techniques for splitting visual data. Such methods allow the generation of a number of secret shares, a certain number of which need to be assembled in order to reconstruct the original image. Traditional techniques for partitioning secret [...] Read more.

Threshold schemes are used among cryptographic techniques for splitting visual data. Such methods allow the generation of a number of secret shares, a certain number of which need to be assembled in order to reconstruct the original image. Traditional techniques for partitioning secret information generate equal shares, i.e., each share has the same value when reconstructing the original secret. However, it turns out that it is possible to develop and use partitioning protocols that allow the generation of privileged shares, i.e., those that allow the reconstruction of secret data in even smaller numbers. This paper will therefore describe new information sharing protocols that create privileged shares, which will also use visual authorization codes based on subject knowledge to select privileged shares for secret restoration. For the protocols described, examples of their functioning will be presented, and their complexity and potential for use in practical applications will be determined. Full article

(This article belongs to the Special Issue Modern Computer Vision and Image Analysis)

► Show Figures

Figure 1

14 pages, 4480 KiB

Open AccessArticle

A Tracking-Based Two-Stage Framework for Spatio-Temporal Action Detection

by Jing Luo, Yulin Yang, Rongkai Liu, Li Chen, Hongxiao Fei, Chao Hu, Ronghua Shi and You Zou

Electronics 2024, 13(3), 479; https://doi.org/10.3390/electronics13030479 - 23 Jan 2024

Viewed by 606

Abstract

Spatio-temporal action detection (STAD) is a task receiving widespread attention and has numerous application scenarios, such as video surveillance and smart education. Current studies follow a localization-based two-stage detection paradigm, which exploits a person detector for action localization and a feature processing model [...] Read more.

Spatio-temporal action detection (STAD) is a task receiving widespread attention and has numerous application scenarios, such as video surveillance and smart education. Current studies follow a localization-based two-stage detection paradigm, which exploits a person detector for action localization and a feature processing model with a classifier for action classification. However, many issues occur due to the imbalance between task settings and model complexity in STAD. Firstly, the model complexity of heavy offline person detectors adds to the inference overhead. Secondly, the frame-level actor proposals are incompatible with the video-level feature aggregation and Region-of-Interest feature pooling in action classification, which limits the detection performance under diverse action motions and results in low detection accuracy. In this paper, we propose a tracking-based two-stage spatio-temporal action detection framework called TrAD. The key idea of TrAD is to build video-level consistency and reduce model complexity in our STAD framework by generating action track proposals among multiple video frames instead of actor proposals in a single frame. In particular, we utilize tailored tracking to simulate the behavior of human cognitive actions and used the captured motion trajectories as video-level proposals. We then integrate a proposal scaling method and a feature aggregation module into action classification to enhance feature pooling for detected tracks. Evaluations in the AVA dataset demonstrate that TrAD achieves SOTA performance with 29.7 mAP, while also facilitating a 58% reduction in overall computation compared to SlowFast. Full article

(This article belongs to the Special Issue Modern Computer Vision and Image Analysis)

► Show Figures

Figure 1

16 pages, 9079 KiB

Open AccessArticle

Quaternion Chromaticity Contrast Preserving Decolorization Method Based on Adaptive Singular Value Weighting

by Zhiliang Zhu, Mengxi Gao, Xiaojun Huang, Xiaosheng Huang and Yuxiao Zhao

Electronics 2024, 13(1), 191; https://doi.org/10.3390/electronics13010191 - 01 Jan 2024

Viewed by 681

Abstract

Color image decolorization can not only simplify the complexity of image processing and analysis, improving computational efficiency, but also help to preserve the key information of the image, enhance visual effects, and meet various practical application requirements. However, with existing decolorization methods it [...] Read more.

Color image decolorization can not only simplify the complexity of image processing and analysis, improving computational efficiency, but also help to preserve the key information of the image, enhance visual effects, and meet various practical application requirements. However, with existing decolorization methods it is difficult to simultaneously maintain the local detail features and global smooth features of the image. To address this shortcoming, this paper utilizes singular value decomposition to obtain the hierarchical local features of the image and utilizes quaternion theory to overcome the limitation of existing color image processing methods that ignore the correlation between the three channels of the color image. Based on this, we propose a singular value adaptive weighted fusion quaternion chromaticity contrast preserving decolorization method. This method utilizes the low-rank matrix approximation principle to design a singular value adaptive weighted fusion strategy for the three channels of the color image and implements image decolorization based on singular value adaptive weighting. To address the deficiency of the decolorization result obtained in this step, which cannot maintain global smoothness characteristics well, a contrast preserving decolorization algorithm based on quaternion chromaticity distance is further proposed, and the global weighting strategy obtained by this algorithm is integrated into the image decolorization based on singular value adaptive weighting. The experimental results show that the decolorization method proposed in this paper achieves excellent results in both subjective visual perception and objective evaluation metrics. Full article

(This article belongs to the Special Issue Modern Computer Vision and Image Analysis)

► Show Figures

Figure 1

18 pages, 7286 KiB

Open AccessArticle

SAR Image Ship Target Detection Based on Receptive Field Enhancement Module and Cross-Layer Feature Fusion

by Haokun Zheng, Xiaorong Xue, Run Yue, Cong Liu and Zheyu Liu

Electronics 2024, 13(1), 167; https://doi.org/10.3390/electronics13010167 - 29 Dec 2023

Viewed by 677

Abstract

The interference of natural factors on the sea surface often results in a blurred background in Synthetic Aperture Radar (SAR) ship images, and the detection difficulty is further increased when different types of ships are densely docked together in nearshore scenes. To tackle [...] Read more.

The interference of natural factors on the sea surface often results in a blurred background in Synthetic Aperture Radar (SAR) ship images, and the detection difficulty is further increased when different types of ships are densely docked together in nearshore scenes. To tackle these hurdles, this paper proposes a target detection model based on YOLOv5s, named YOLO-CLF. Initially, we constructed a Receptive Field Enhancement Module (RFEM) to improve the model’s performance in handling blurred background images. Subsequently, considering the situation of dense multi-size ship images, we designed a Cross-Layer Fusion Feature Pyramid Network (CLF-FPN) to aggregate multi-scale features, thereby enhancing detection accuracy. Finally, we introduce a Normalized Wasserstein Distance (NWD) metric to replace the commonly used Intersection over Union (IoU) metric, aiming to improve the detection capability of small targets. Experimental findings show that the enhanced algorithm attains an Average Precision (AP50) of 98.2% and 90.4% on the SSDD and HRSID datasets, respectively, which is an increase of 1.3% and 2.2% compared to the baseline model YOLOv5s. Simultaneously, it has also achieved a significant performance advantage in comparison to some other models. Full article

(This article belongs to the Special Issue Modern Computer Vision and Image Analysis)

► Show Figures

Figure 1

14 pages, 10092 KiB

Open AccessArticle

The Influence of the Skin Phenomenon on the Impedance of Thin Conductive Layers

by Stanisław Pawłowski, Jolanta Plewako, Ewa Korzeniewska and Dariusz Sobczyński

Electronics 2023, 12(23), 4834; https://doi.org/10.3390/electronics12234834 - 30 Nov 2023

Viewed by 597

Abstract

This paper analyzes the influence of the skin effect and the proximity effect on the inductance and impedance of thin conductive layers. The motivation for taking up this topic is the initial assessment of the possibility of using conductive layers deposited with the [...] Read more.

This paper analyzes the influence of the skin effect and the proximity effect on the inductance and impedance of thin conductive layers. The motivation for taking up this topic is the initial assessment of the possibility of using conductive layers deposited with the PVD technique on textile materials as strip or planar transmission lines of high-frequency signals (e.g., for transmitting images). This work pursues two goals. The first of them is to develop and test a numerical procedure for calculating the electromagnetic field distribution in this type of issue, based on the fundamental solution method (FSM). The second aim is to examine the impact of the skin phenomenon on the resistance, inductance and impedance of thin conductive paths. The correctness and effectiveness of FSM for the analysis of harmonics of electromagnetic fields in systems containing thin conductive layers were confirmed. Based on the performed simulations, it was found that in the frequency range above 10 MHz, the dependence of resistance and impedance on frequency is a power function with an exponent independent of the path width. Moreover, it was found that for paths with a width at least several times greater than their thickness, the dependence of the phase shift between current and voltage on frequency practically does not depend on the path width. Full article

(This article belongs to the Special Issue Modern Computer Vision and Image Analysis)

► Show Figures

Figure 1

20 pages, 10740 KiB

Open AccessArticle

An Improved Vibe Algorithm Based on Adaptive Thresholding and the Deep Learning-Driven Frame Difference Method

by Huilin Liu, Huazhang Wei, Gaoming Yang, Chenxing Xia and Shenghui Zhao

Electronics 2023, 12(16), 3481; https://doi.org/10.3390/electronics12163481 - 17 Aug 2023

Viewed by 1061

Abstract

Foreground detection is the main way to identify regions of interest. The detection effectiveness determines the accuracy of subsequent behavior analysis. In order to enhance the detection effect and optimize the problems of low accuracy, this paper proposes an improved Vibe algorithm combining [...] Read more.

Foreground detection is the main way to identify regions of interest. The detection effectiveness determines the accuracy of subsequent behavior analysis. In order to enhance the detection effect and optimize the problems of low accuracy, this paper proposes an improved Vibe algorithm combining the frame difference method and adaptive thresholding. First, we adopt a shallow convolutional layer of VGG16 to extract the lower-level features of the image. Features images with high correlation are fused into a new image. Second, adaptive factors based on the spatio-temporal domain are introduced to divide the foreground and background. Finally, we construct an inter-frame average speed value to measure the moving speed of the foreground, which solves the mismatch problem between background change rate and model update rate. Experimental results show that our algorithm can effectively solve the drawback of the traditional method and prevent the background model from being contaminated. It suppresses the generation of ghosting, significantly improves detection accuracy, and reduces the false detection rate. Full article

(This article belongs to the Special Issue Modern Computer Vision and Image Analysis)

► Show Figures

Figure 1

14 pages, 2778 KiB

Open AccessArticle

Multiscale Local and Global Feature Fusion for the Detection of Steel Surface Defects

by Li Zhang, Zhipeng Fu, Huaping Guo, Yange Sun, Xirui Li and Mingliang Xu

Electronics 2023, 12(14), 3090; https://doi.org/10.3390/electronics12143090 - 16 Jul 2023

Cited by 2 | Viewed by 943

Abstract

Steel surface defects have a significant impact on the quality and performance of many industrial products and cause huge economic losses. Therefore, it is meaningful to detect steel surface defects in real time. To improve the detection performance of steel surface defects with [...] Read more.

Steel surface defects have a significant impact on the quality and performance of many industrial products and cause huge economic losses. Therefore, it is meaningful to detect steel surface defects in real time. To improve the detection performance of steel surface defects with variable scales and complex backgrounds, in this paper, a novel method for detecting steel surface defects through a multiscale local and global feature fusion mechanism is proposed. The proposed method uses a convolution operation with a downsampling mechanism in the convolutional neural network model to obtain rough multiscale feature maps. Then, a context-extraction block (CEB) is proposed to adopt self-attention learning on the feature maps extracted by the convolution operation at each scale to obtain multiscale global context information to make up for the shortcomings of convolutional neural networks (CNNs), thus forming a novel multiscale self-attention mechanism. Afterwards, using the feature pyramid structure, multiscale feature maps are fused to improve multiscale object detection. Finally, the channel and spatial attention module and the WIOU (Wise Intersection over Union) loss function are introduced. The model achieved 78.2% and 71.9% mAP respectively on the NEU-DET and GC10-DET dataset. Compared to algorithms such as Faster RCNN and EDDN, this method is effective in improving the detection performance of steel surface defects. Full article

(This article belongs to the Special Issue Modern Computer Vision and Image Analysis)

► Show Figures

Figure 1

14 pages, 2742 KiB

Open AccessFeature PaperArticle

Video Object Segmentation Using Multi-Scale Attention-Based Siamese Network

by Zhiliang Zhu, Leiningxin Qiu, Jiaxin Wang, Jinquan Xiong and Hua Peng

Electronics 2023, 12(13), 2890; https://doi.org/10.3390/electronics12132890 - 30 Jun 2023

Viewed by 1053

Abstract

Video target segmentation is a fundamental problem in computer vision that aims to segment targets from a background by learning their appearance information and movement information. In this study, a video target segmentation network based on the Siamese structure was proposed. This network [...] Read more.

Video target segmentation is a fundamental problem in computer vision that aims to segment targets from a background by learning their appearance information and movement information. In this study, a video target segmentation network based on the Siamese structure was proposed. This network has two inputs: the current video frame, used as the main input, and the adjacent frame, used as the auxiliary input. The processing modules for the inputs use the same structure, optimization strategy, and encoder weights. The input is encoded to obtain features with different resolutions, from which good target appearance features can be obtained. After processing using the encoding layer, the motion features of the target are learned using a multi-scale feature fusion decoder based on an attention mechanism. The final predicted segmentation results were calculated from a layer of decoded features. The video object segmentation framework proposed in this study achieved optimal results on CDNet2014 and FBMS-3D, with scores of 78.36 and 86.71, respectively. It outperformed the second-ranked method by 4.3 on the CDNet2014 dataset and by 0.77 on the FBMS-3D dataset. Suboptimal results were achieved on the video primary target segmentation datasets SegTrackV2 and DAVIS2016, with scores of 60.57 and 81.08, respectively. Full article

(This article belongs to the Special Issue Modern Computer Vision and Image Analysis)

► Show Figures

Figure 1

17 pages, 5911 KiB

Open AccessArticle

Medical Image Fusion Using SKWGF and SWF in Framelet Transform Domain

by Weiwei Kong, Yiwen Li and Yang Lei

Electronics 2023, 12(12), 2659; https://doi.org/10.3390/electronics12122659 - 13 Jun 2023

Cited by 1 | Viewed by 879

Abstract

Accurately localizing and describing patients’ lesions has long been considered a crucial aspect of clinical diagnosis. The fusion of multimodal medical images provides a feasible solution to the above problem. Unfortunately, the trade-off between the fusion performance and heavy computation overhead remains a [...] Read more.

Accurately localizing and describing patients’ lesions has long been considered a crucial aspect of clinical diagnosis. The fusion of multimodal medical images provides a feasible solution to the above problem. Unfortunately, the trade-off between the fusion performance and heavy computation overhead remains a challenge. In this paper, a novel and effective fusion method for multimodal medical images is proposed. Firstly, framelet transform (FT) is introduced to decompose the source images into a series of low and high frequency sub-images. Next, we utilize the benefits of both steering kernel weighted guided filtering and side window filtering to successfully fuse sub-images. Finally, the inverse FT is employed to reconstruct the final fused image. To verify the effectiveness of the proposed fusion method, we fused several pairs of medical images covering different modalities in simulation experiments. The experimental results demonstrate that the proposed method yields better performance than current representative ones in terms of both visual quality and quantitative evaluation. Full article

(This article belongs to the Special Issue Modern Computer Vision and Image Analysis)

► Show Figures

Figure 1

24 pages, 29514 KiB

Open AccessArticle

Low-Rank and Total Variation Regularization with ℓ₀ Data Fidelity Constraint for Image Deblurring under Impulse Noise

by Yuting Wang, Yuchao Tang and Shirong Deng

Electronics 2023, 12(11), 2432; https://doi.org/10.3390/electronics12112432 - 27 May 2023

Viewed by 914

Abstract

Impulse noise removal is an important problem in the field of image processing. Although many methods exist to remove impulse noise, there is still room for improvement. This paper proposes a new method for removing impulse noise that combines the nuclear norm and [...] Read more.

Impulse noise removal is an important problem in the field of image processing. Although many methods exist to remove impulse noise, there is still room for improvement. This paper proposes a new method for removing impulse noise that combines the nuclear norm and the detection

ℓ_{0}

TV model while considering the low-rank structure commonly found in visual images. The nuclear norm maintains this structure, while the detection

ℓ_{0}

TV criterion promotes sparsity in the gradient domain, effectively removing impulse noise while preserving edges and other vital features. To solve the non-convex and non-smooth optimization problem, we use a mathematical process with equilibrium constraints (MPEC) to transform it. Subsequently, the proximal alternating direction multiplication algorithm is used to solve the transformed problem. The convergence of the algorithm is proven under mild conditions. Numerical experiments in denoising and deblurring show that for low-rank images, the proposed method outperforms

ℓ_{1}

TV with detection,

ℓ_{0}

TV and

ℓ_{0}

OGSTV. Full article

(This article belongs to the Special Issue Modern Computer Vision and Image Analysis)

► Show Figures

Figure 1

14 pages, 595 KiB

Open AccessArticle

FASS: Face Anti-Spoofing System Using Image Quality Features and Deep Learning

by Enoch Solomon and Krzysztof J. Cios

Electronics 2023, 12(10), 2199; https://doi.org/10.3390/electronics12102199 - 12 May 2023

Cited by 8 | Viewed by 2995

Abstract

Face recognition technology has been widely used due to the convenience it provides. However, face recognition is vulnerable to spoofing attacks which limits its usage in sensitive application areas. This work introduces a novel face anti-spoofing system, FASS, that fuses results of two [...] Read more.

Face recognition technology has been widely used due to the convenience it provides. However, face recognition is vulnerable to spoofing attacks which limits its usage in sensitive application areas. This work introduces a novel face anti-spoofing system, FASS, that fuses results of two classifiers. One, random forest, uses the identified by us seven no-reference image quality features derived from face images and its results are fused with a deep learning classifier results that uses entire face images as input. Extensive experiments were performed to compare FASS with state-of-the-art anti-spoofing systems on five benchmark datasets: Replay-Attack, CASIA-MFSD, MSU-MFSD, OULU-NPU and SiW. The results show that FASS outperforms all face anti-spoofing systems based on image quality features and is also more accurate than many of the state-of-the-art systems based on deep learning. Full article

(This article belongs to the Special Issue Modern Computer Vision and Image Analysis)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Modern Computer Vision and Image Analysis

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Published Papers (12 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI