Modern Computer Vision and Image Analysis

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: 31 August 2024 | Viewed by 12721

Special Issue Editors


E-Mail Website
Guest Editor
Department of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710000, China
Interests: neural networks for image processing and pattern recogni-tion; computer-aided diagnosis; image processing suggested by human visual systems

E-Mail Website
Guest Editor
School of Information Science and Technology, Aichi Prefec-tural University, Aichi 480-1198, Japan
Interests: pattern recognition; image processing; image analysis

E-Mail Website
Guest Editor
Institute for Integrated and Intelligent Systems, Griffith University, Brisbane 94005, Australia
Interests: domain adaption; self-supervised learning; continue learning; few-shot learning

E-Mail Website
Guest Editor
School of Software Engineering, Xi’an Jiaotong University, Xi’an 710049, China
Interests: 3D vision; image processing

E-Mail Website
Guest Editor
School of Computer Science and Technology, Shandong Technology and Business University, Yantai 264005, China
Interests: neural networks for image processing and pattern recogni-tion; computer-aided diagnosis

Special Issue Information

Dear Colleagues,

This Special Issue aims to collect high-quality research papers on learning from limited samples and labeled data for computer vision and image processing applications (such as image classification, object detection, semantic segmentation, instance segmentation, etc.), as well as to publish new ideas, theories, solutions, and insights on the subject, and display their relevant applications.

Topics of interest in this Special Issue include but are not limited to:

  • Theory of computer vision and image processing;
  • Low level visual understanding and image processing;
  • 3D vision and reconstruction;
  • Document analysis and identification;
  • Target detection, tracking and recognition;
  • Behavior recognition;
  • Multimedia analysis and reasoning;
  • Medical image processing and analysis;
  • Remote sensing image interpretation;
  • Optimization and learning methods;
  • Multimodal information processing;
  • Performance measurement and benchmark database;
  • Video analysis and understanding;
  • Visual application and system.

Prof. Dr. Zhenghao Shi
Prof. Dr. Lifeng He
Dr. Miaohua Zhang
Prof. Dr. Jihua Zhu
Prof. Dr. Feng Zhao
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computer vision
  • image processing
  • deep learning
  • medical image processing and analysis
  • target detection
  • tracking and recognition
  • remote sensing image inter-pretation
  • video analysis and understanding

Published Papers (12 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

16 pages, 9375 KiB  
Article
AiPE: A Novel Transformer-Based Pose Estimation Method
by Kai Lu and Dugki Min
Electronics 2024, 13(5), 967; https://doi.org/10.3390/electronics13050967 - 02 Mar 2024
Viewed by 705
Abstract
Human pose estimation is an important problem in computer vision because it is the foundation for many advanced semantic tasks and downstream applications. Although some convolutional neural network-based pose estimation methods have achieved good results, these networks are still limited for restricted receptive [...] Read more.
Human pose estimation is an important problem in computer vision because it is the foundation for many advanced semantic tasks and downstream applications. Although some convolutional neural network-based pose estimation methods have achieved good results, these networks are still limited for restricted receptive fields and weak robustness, leading to poor detection performance in scenarios with blur or low resolution. Additionally, their highly parallelized strategy is likely to cause significant computational demands, requiring high computing power. In comparison to the convolutional neural networks, the transformer-based methods offer advantages such as flexible stacking, global perspective, and parallel computation. Based on the great benefits, a novel transformer-based human pose estimation method is developed, which employees multi-head self-attention mechanisms and offset windows to effectively suppress the quick growth of the computational complexity near human keypoints. Experimental results under detailed visual comparison and quantitative analysis demonstrate that the proposed method can efficiently deal with the pose estimation problem in challenging scenarios, such as blurry or occluded scenes. Furthermore, the errors in human skeleton mapping caused by keypoint occlusion or omission can be effectively corrected, so the accuracy of pose estimation results is greatly improved. Full article
(This article belongs to the Special Issue Modern Computer Vision and Image Analysis)
Show Figures

Figure 1

11 pages, 4976 KiB  
Article
Image Division Using Threshold Schemes with Privileges
by Marek R. Ogiela and Lidia Ogiela
Electronics 2024, 13(5), 931; https://doi.org/10.3390/electronics13050931 - 29 Feb 2024
Viewed by 411
Abstract
Threshold schemes are used among cryptographic techniques for splitting visual data. Such methods allow the generation of a number of secret shares, a certain number of which need to be assembled in order to reconstruct the original image. Traditional techniques for partitioning secret [...] Read more.
Threshold schemes are used among cryptographic techniques for splitting visual data. Such methods allow the generation of a number of secret shares, a certain number of which need to be assembled in order to reconstruct the original image. Traditional techniques for partitioning secret information generate equal shares, i.e., each share has the same value when reconstructing the original secret. However, it turns out that it is possible to develop and use partitioning protocols that allow the generation of privileged shares, i.e., those that allow the reconstruction of secret data in even smaller numbers. This paper will therefore describe new information sharing protocols that create privileged shares, which will also use visual authorization codes based on subject knowledge to select privileged shares for secret restoration. For the protocols described, examples of their functioning will be presented, and their complexity and potential for use in practical applications will be determined. Full article
(This article belongs to the Special Issue Modern Computer Vision and Image Analysis)
Show Figures

Figure 1

14 pages, 4480 KiB  
Article
A Tracking-Based Two-Stage Framework for Spatio-Temporal Action Detection
by Jing Luo, Yulin Yang, Rongkai Liu, Li Chen, Hongxiao Fei, Chao Hu, Ronghua Shi and You Zou
Electronics 2024, 13(3), 479; https://doi.org/10.3390/electronics13030479 - 23 Jan 2024
Viewed by 606
Abstract
Spatio-temporal action detection (STAD) is a task receiving widespread attention and has numerous application scenarios, such as video surveillance and smart education. Current studies follow a localization-based two-stage detection paradigm, which exploits a person detector for action localization and a feature processing model [...] Read more.
Spatio-temporal action detection (STAD) is a task receiving widespread attention and has numerous application scenarios, such as video surveillance and smart education. Current studies follow a localization-based two-stage detection paradigm, which exploits a person detector for action localization and a feature processing model with a classifier for action classification. However, many issues occur due to the imbalance between task settings and model complexity in STAD. Firstly, the model complexity of heavy offline person detectors adds to the inference overhead. Secondly, the frame-level actor proposals are incompatible with the video-level feature aggregation and Region-of-Interest feature pooling in action classification, which limits the detection performance under diverse action motions and results in low detection accuracy. In this paper, we propose a tracking-based two-stage spatio-temporal action detection framework called TrAD. The key idea of TrAD is to build video-level consistency and reduce model complexity in our STAD framework by generating action track proposals among multiple video frames instead of actor proposals in a single frame. In particular, we utilize tailored tracking to simulate the behavior of human cognitive actions and used the captured motion trajectories as video-level proposals. We then integrate a proposal scaling method and a feature aggregation module into action classification to enhance feature pooling for detected tracks. Evaluations in the AVA dataset demonstrate that TrAD achieves SOTA performance with 29.7 mAP, while also facilitating a 58% reduction in overall computation compared to SlowFast. Full article
(This article belongs to the Special Issue Modern Computer Vision and Image Analysis)
Show Figures

Figure 1

16 pages, 9079 KiB  
Article
Quaternion Chromaticity Contrast Preserving Decolorization Method Based on Adaptive Singular Value Weighting
by Zhiliang Zhu, Mengxi Gao, Xiaojun Huang, Xiaosheng Huang and Yuxiao Zhao
Electronics 2024, 13(1), 191; https://doi.org/10.3390/electronics13010191 - 01 Jan 2024
Viewed by 681
Abstract
Color image decolorization can not only simplify the complexity of image processing and analysis, improving computational efficiency, but also help to preserve the key information of the image, enhance visual effects, and meet various practical application requirements. However, with existing decolorization methods it [...] Read more.
Color image decolorization can not only simplify the complexity of image processing and analysis, improving computational efficiency, but also help to preserve the key information of the image, enhance visual effects, and meet various practical application requirements. However, with existing decolorization methods it is difficult to simultaneously maintain the local detail features and global smooth features of the image. To address this shortcoming, this paper utilizes singular value decomposition to obtain the hierarchical local features of the image and utilizes quaternion theory to overcome the limitation of existing color image processing methods that ignore the correlation between the three channels of the color image. Based on this, we propose a singular value adaptive weighted fusion quaternion chromaticity contrast preserving decolorization method. This method utilizes the low-rank matrix approximation principle to design a singular value adaptive weighted fusion strategy for the three channels of the color image and implements image decolorization based on singular value adaptive weighting. To address the deficiency of the decolorization result obtained in this step, which cannot maintain global smoothness characteristics well, a contrast preserving decolorization algorithm based on quaternion chromaticity distance is further proposed, and the global weighting strategy obtained by this algorithm is integrated into the image decolorization based on singular value adaptive weighting. The experimental results show that the decolorization method proposed in this paper achieves excellent results in both subjective visual perception and objective evaluation metrics. Full article
(This article belongs to the Special Issue Modern Computer Vision and Image Analysis)
Show Figures

Figure 1

18 pages, 7286 KiB  
Article
SAR Image Ship Target Detection Based on Receptive Field Enhancement Module and Cross-Layer Feature Fusion
by Haokun Zheng, Xiaorong Xue, Run Yue, Cong Liu and Zheyu Liu
Electronics 2024, 13(1), 167; https://doi.org/10.3390/electronics13010167 - 29 Dec 2023
Viewed by 677
Abstract
The interference of natural factors on the sea surface often results in a blurred background in Synthetic Aperture Radar (SAR) ship images, and the detection difficulty is further increased when different types of ships are densely docked together in nearshore scenes. To tackle [...] Read more.
The interference of natural factors on the sea surface often results in a blurred background in Synthetic Aperture Radar (SAR) ship images, and the detection difficulty is further increased when different types of ships are densely docked together in nearshore scenes. To tackle these hurdles, this paper proposes a target detection model based on YOLOv5s, named YOLO-CLF. Initially, we constructed a Receptive Field Enhancement Module (RFEM) to improve the model’s performance in handling blurred background images. Subsequently, considering the situation of dense multi-size ship images, we designed a Cross-Layer Fusion Feature Pyramid Network (CLF-FPN) to aggregate multi-scale features, thereby enhancing detection accuracy. Finally, we introduce a Normalized Wasserstein Distance (NWD) metric to replace the commonly used Intersection over Union (IoU) metric, aiming to improve the detection capability of small targets. Experimental findings show that the enhanced algorithm attains an Average Precision (AP50) of 98.2% and 90.4% on the SSDD and HRSID datasets, respectively, which is an increase of 1.3% and 2.2% compared to the baseline model YOLOv5s. Simultaneously, it has also achieved a significant performance advantage in comparison to some other models. Full article
(This article belongs to the Special Issue Modern Computer Vision and Image Analysis)
Show Figures

Figure 1

14 pages, 10092 KiB  
Article
The Influence of the Skin Phenomenon on the Impedance of Thin Conductive Layers
by Stanisław Pawłowski, Jolanta Plewako, Ewa Korzeniewska and Dariusz Sobczyński
Electronics 2023, 12(23), 4834; https://doi.org/10.3390/electronics12234834 - 30 Nov 2023
Viewed by 597
Abstract
This paper analyzes the influence of the skin effect and the proximity effect on the inductance and impedance of thin conductive layers. The motivation for taking up this topic is the initial assessment of the possibility of using conductive layers deposited with the [...] Read more.
This paper analyzes the influence of the skin effect and the proximity effect on the inductance and impedance of thin conductive layers. The motivation for taking up this topic is the initial assessment of the possibility of using conductive layers deposited with the PVD technique on textile materials as strip or planar transmission lines of high-frequency signals (e.g., for transmitting images). This work pursues two goals. The first of them is to develop and test a numerical procedure for calculating the electromagnetic field distribution in this type of issue, based on the fundamental solution method (FSM). The second aim is to examine the impact of the skin phenomenon on the resistance, inductance and impedance of thin conductive paths. The correctness and effectiveness of FSM for the analysis of harmonics of electromagnetic fields in systems containing thin conductive layers were confirmed. Based on the performed simulations, it was found that in the frequency range above 10 MHz, the dependence of resistance and impedance on frequency is a power function with an exponent independent of the path width. Moreover, it was found that for paths with a width at least several times greater than their thickness, the dependence of the phase shift between current and voltage on frequency practically does not depend on the path width. Full article
(This article belongs to the Special Issue Modern Computer Vision and Image Analysis)
Show Figures

Figure 1

20 pages, 10740 KiB  
Article
An Improved Vibe Algorithm Based on Adaptive Thresholding and the Deep Learning-Driven Frame Difference Method
by Huilin Liu, Huazhang Wei, Gaoming Yang, Chenxing Xia and Shenghui Zhao
Electronics 2023, 12(16), 3481; https://doi.org/10.3390/electronics12163481 - 17 Aug 2023
Viewed by 1061
Abstract
Foreground detection is the main way to identify regions of interest. The detection effectiveness determines the accuracy of subsequent behavior analysis. In order to enhance the detection effect and optimize the problems of low accuracy, this paper proposes an improved Vibe algorithm combining [...] Read more.
Foreground detection is the main way to identify regions of interest. The detection effectiveness determines the accuracy of subsequent behavior analysis. In order to enhance the detection effect and optimize the problems of low accuracy, this paper proposes an improved Vibe algorithm combining the frame difference method and adaptive thresholding. First, we adopt a shallow convolutional layer of VGG16 to extract the lower-level features of the image. Features images with high correlation are fused into a new image. Second, adaptive factors based on the spatio-temporal domain are introduced to divide the foreground and background. Finally, we construct an inter-frame average speed value to measure the moving speed of the foreground, which solves the mismatch problem between background change rate and model update rate. Experimental results show that our algorithm can effectively solve the drawback of the traditional method and prevent the background model from being contaminated. It suppresses the generation of ghosting, significantly improves detection accuracy, and reduces the false detection rate. Full article
(This article belongs to the Special Issue Modern Computer Vision and Image Analysis)
Show Figures

Figure 1

14 pages, 2778 KiB  
Article
Multiscale Local and Global Feature Fusion for the Detection of Steel Surface Defects
by Li Zhang, Zhipeng Fu, Huaping Guo, Yange Sun, Xirui Li and Mingliang Xu
Electronics 2023, 12(14), 3090; https://doi.org/10.3390/electronics12143090 - 16 Jul 2023
Cited by 2 | Viewed by 943
Abstract
Steel surface defects have a significant impact on the quality and performance of many industrial products and cause huge economic losses. Therefore, it is meaningful to detect steel surface defects in real time. To improve the detection performance of steel surface defects with [...] Read more.
Steel surface defects have a significant impact on the quality and performance of many industrial products and cause huge economic losses. Therefore, it is meaningful to detect steel surface defects in real time. To improve the detection performance of steel surface defects with variable scales and complex backgrounds, in this paper, a novel method for detecting steel surface defects through a multiscale local and global feature fusion mechanism is proposed. The proposed method uses a convolution operation with a downsampling mechanism in the convolutional neural network model to obtain rough multiscale feature maps. Then, a context-extraction block (CEB) is proposed to adopt self-attention learning on the feature maps extracted by the convolution operation at each scale to obtain multiscale global context information to make up for the shortcomings of convolutional neural networks (CNNs), thus forming a novel multiscale self-attention mechanism. Afterwards, using the feature pyramid structure, multiscale feature maps are fused to improve multiscale object detection. Finally, the channel and spatial attention module and the WIOU (Wise Intersection over Union) loss function are introduced. The model achieved 78.2% and 71.9% mAP respectively on the NEU-DET and GC10-DET dataset. Compared to algorithms such as Faster RCNN and EDDN, this method is effective in improving the detection performance of steel surface defects. Full article
(This article belongs to the Special Issue Modern Computer Vision and Image Analysis)
Show Figures

Figure 1

14 pages, 2742 KiB  
Article
Video Object Segmentation Using Multi-Scale Attention-Based Siamese Network
by Zhiliang Zhu, Leiningxin Qiu, Jiaxin Wang, Jinquan Xiong and Hua Peng
Electronics 2023, 12(13), 2890; https://doi.org/10.3390/electronics12132890 - 30 Jun 2023
Viewed by 1053
Abstract
Video target segmentation is a fundamental problem in computer vision that aims to segment targets from a background by learning their appearance information and movement information. In this study, a video target segmentation network based on the Siamese structure was proposed. This network [...] Read more.
Video target segmentation is a fundamental problem in computer vision that aims to segment targets from a background by learning their appearance information and movement information. In this study, a video target segmentation network based on the Siamese structure was proposed. This network has two inputs: the current video frame, used as the main input, and the adjacent frame, used as the auxiliary input. The processing modules for the inputs use the same structure, optimization strategy, and encoder weights. The input is encoded to obtain features with different resolutions, from which good target appearance features can be obtained. After processing using the encoding layer, the motion features of the target are learned using a multi-scale feature fusion decoder based on an attention mechanism. The final predicted segmentation results were calculated from a layer of decoded features. The video object segmentation framework proposed in this study achieved optimal results on CDNet2014 and FBMS-3D, with scores of 78.36 and 86.71, respectively. It outperformed the second-ranked method by 4.3 on the CDNet2014 dataset and by 0.77 on the FBMS-3D dataset. Suboptimal results were achieved on the video primary target segmentation datasets SegTrackV2 and DAVIS2016, with scores of 60.57 and 81.08, respectively. Full article
(This article belongs to the Special Issue Modern Computer Vision and Image Analysis)
Show Figures

Figure 1

17 pages, 5911 KiB  
Article
Medical Image Fusion Using SKWGF and SWF in Framelet Transform Domain
by Weiwei Kong, Yiwen Li and Yang Lei
Electronics 2023, 12(12), 2659; https://doi.org/10.3390/electronics12122659 - 13 Jun 2023
Cited by 1 | Viewed by 879
Abstract
Accurately localizing and describing patients’ lesions has long been considered a crucial aspect of clinical diagnosis. The fusion of multimodal medical images provides a feasible solution to the above problem. Unfortunately, the trade-off between the fusion performance and heavy computation overhead remains a [...] Read more.
Accurately localizing and describing patients’ lesions has long been considered a crucial aspect of clinical diagnosis. The fusion of multimodal medical images provides a feasible solution to the above problem. Unfortunately, the trade-off between the fusion performance and heavy computation overhead remains a challenge. In this paper, a novel and effective fusion method for multimodal medical images is proposed. Firstly, framelet transform (FT) is introduced to decompose the source images into a series of low and high frequency sub-images. Next, we utilize the benefits of both steering kernel weighted guided filtering and side window filtering to successfully fuse sub-images. Finally, the inverse FT is employed to reconstruct the final fused image. To verify the effectiveness of the proposed fusion method, we fused several pairs of medical images covering different modalities in simulation experiments. The experimental results demonstrate that the proposed method yields better performance than current representative ones in terms of both visual quality and quantitative evaluation. Full article
(This article belongs to the Special Issue Modern Computer Vision and Image Analysis)
Show Figures

Figure 1

24 pages, 29514 KiB  
Article
Low-Rank and Total Variation Regularization with 0 Data Fidelity Constraint for Image Deblurring under Impulse Noise
by Yuting Wang, Yuchao Tang and Shirong Deng
Electronics 2023, 12(11), 2432; https://doi.org/10.3390/electronics12112432 - 27 May 2023
Viewed by 914
Abstract
Impulse noise removal is an important problem in the field of image processing. Although many methods exist to remove impulse noise, there is still room for improvement. This paper proposes a new method for removing impulse noise that combines the nuclear norm and [...] Read more.
Impulse noise removal is an important problem in the field of image processing. Although many methods exist to remove impulse noise, there is still room for improvement. This paper proposes a new method for removing impulse noise that combines the nuclear norm and the detection 0TV model while considering the low-rank structure commonly found in visual images. The nuclear norm maintains this structure, while the detection 0TV criterion promotes sparsity in the gradient domain, effectively removing impulse noise while preserving edges and other vital features. To solve the non-convex and non-smooth optimization problem, we use a mathematical process with equilibrium constraints (MPEC) to transform it. Subsequently, the proximal alternating direction multiplication algorithm is used to solve the transformed problem. The convergence of the algorithm is proven under mild conditions. Numerical experiments in denoising and deblurring show that for low-rank images, the proposed method outperforms 1TV with detection, 0TV and 0OGSTV. Full article
(This article belongs to the Special Issue Modern Computer Vision and Image Analysis)
Show Figures

Figure 1

14 pages, 595 KiB  
Article
FASS: Face Anti-Spoofing System Using Image Quality Features and Deep Learning
by Enoch Solomon and Krzysztof J. Cios
Electronics 2023, 12(10), 2199; https://doi.org/10.3390/electronics12102199 - 12 May 2023
Cited by 8 | Viewed by 2995
Abstract
Face recognition technology has been widely used due to the convenience it provides. However, face recognition is vulnerable to spoofing attacks which limits its usage in sensitive application areas. This work introduces a novel face anti-spoofing system, FASS, that fuses results of two [...] Read more.
Face recognition technology has been widely used due to the convenience it provides. However, face recognition is vulnerable to spoofing attacks which limits its usage in sensitive application areas. This work introduces a novel face anti-spoofing system, FASS, that fuses results of two classifiers. One, random forest, uses the identified by us seven no-reference image quality features derived from face images and its results are fused with a deep learning classifier results that uses entire face images as input. Extensive experiments were performed to compare FASS with state-of-the-art anti-spoofing systems on five benchmark datasets: Replay-Attack, CASIA-MFSD, MSU-MFSD, OULU-NPU and SiW. The results show that FASS outperforms all face anti-spoofing systems based on image quality features and is also more accurate than many of the state-of-the-art systems based on deep learning. Full article
(This article belongs to the Special Issue Modern Computer Vision and Image Analysis)
Show Figures

Figure 1

Back to TopTop