Topic Editors

Department of Engineering (DING), University of Sannio, Benevento, Italy

Computer Vision and Image Processing

Abstract submission deadline
closed (31 March 2023)
Manuscript submission deadline
closed (30 June 2023)
Viewed by
179205

Topic Information

Dear Colleagues,

Computer vision is a scientific discipline that aims at developing models for understanding our 3D environment using cameras. Further, image processing can be understood as the whole body of techniques that extract useful information directly from images or to process them for optimal subsequent analysis. At any rate, computer vision and image processing are two closely related fields which can be considered as a work area used in almost any research involving cameras or any image sensor to acquire information from the scenes or working environments. Thus, the main aim of this Topic is to cover some of the relevant areas where computer vision/image processing is applied, including but not limited to:

  • Three-dimensional image acquisition, processing, and visualization
  • Scene understanding
  • Greyscale, color, and multispectral image processing
  • Multimodal sensor fusion
  • Industrial inspection
  • Robotics
  • Surveillance
  • Airborne and satellite on-board image acquisition platforms.
  • Computational models of vision
  • Imaging psychophysics
  • Etc.

Prof. Dr. Silvia Liberata Ullo
Topic Editor

Keywords

  • 3D acquisition, processing, and visualization
  • scene understanding
  • multimodal sensor processing and fusion
  • multispectral, color, and greyscale image processing
  • industrial quality inspection
  • computer vision for robotics
  • computer vision for surveillance
  • airborne and satellite on-board image acquisition platforms
  • computational models of vision
  • imaging psychophysics

Participating Journals

Journal Name Impact Factor CiteScore Launched Year First Decision (median) APC
Applied Sciences
applsci
2.7 4.5 2011 16.9 Days CHF 2400
Electronics
electronics
2.9 4.7 2012 15.6 Days CHF 2400
Modelling
modelling
- - 2020 15.8 Days CHF 1000
Journal of Imaging
jimaging
3.2 4.4 2015 21.7 Days CHF 1800

Preprints.org is a multidiscipline platform providing preprint service that is dedicated to sharing your research from the start and empowering your research journey.

MDPI Topics is cooperating with Preprints.org and has built a direct connection between MDPI journals and Preprints.org. Authors are encouraged to enjoy the benefits by posting a preprint at Preprints.org prior to publication:

  1. Immediately share your ideas ahead of publication and establish your research priority;
  2. Protect your idea from being stolen with this time-stamped preprint article;
  3. Enhance the exposure and impact of your research;
  4. Receive feedback from your peers in advance;
  5. Have it indexed in Web of Science (Preprint Citation Index), Google Scholar, Crossref, SHARE, PrePubMed, Scilit and Europe PMC.

Published Papers (101 papers)

Order results
Result details
Journals
Select all
Export citation of selected articles as:
16 pages, 500 KiB  
Article
CL3: Generalization of Contrastive Loss for Lifelong Learning
by Kaushik Roy, Christian Simon, Peyman Moghadam and Mehrtash Harandi
J. Imaging 2023, 9(12), 259; https://doi.org/10.3390/jimaging9120259 - 23 Nov 2023
Viewed by 1441
Abstract
Lifelong learning portrays learning gradually in nonstationary environments and emulates the process of human learning, which is efficient, robust, and able to learn new concepts incrementally from sequential experience. To equip neural networks with such a capability, one needs to overcome the problem [...] Read more.
Lifelong learning portrays learning gradually in nonstationary environments and emulates the process of human learning, which is efficient, robust, and able to learn new concepts incrementally from sequential experience. To equip neural networks with such a capability, one needs to overcome the problem of catastrophic forgetting, the phenomenon of forgetting past knowledge while learning new concepts. In this work, we propose a novel knowledge distillation algorithm that makes use of contrastive learning to help a neural network to preserve its past knowledge while learning from a series of tasks. Our proposed generalized form of contrastive distillation strategy tackles catastrophic forgetting of old knowledge, and minimizes semantic drift by maintaining a similar embedding space, as well as ensures compactness in feature distribution to accommodate novel tasks in a current model. Our comprehensive study shows that our method achieves improved performances in the challenging class-incremental, task-incremental, and domain-incremental learning for supervised scenarios. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

30 pages, 9579 KiB  
Review
A Review for the Euler Number Computing Problem
by Bin Yao, Haochen He, Shiying Kang, Yuyan Chao and Lifeng He
Electronics 2023, 12(21), 4406; https://doi.org/10.3390/electronics12214406 - 25 Oct 2023
Cited by 1 | Viewed by 873
Abstract
In a binary image, the Euler number is a crucial topological feature that holds immense significance in image understanding and image analysis owing to its invariance under scaling, rotation, or any arbitrary rubber-sheet transformation of images. This paper focuses on the Euler number [...] Read more.
In a binary image, the Euler number is a crucial topological feature that holds immense significance in image understanding and image analysis owing to its invariance under scaling, rotation, or any arbitrary rubber-sheet transformation of images. This paper focuses on the Euler number computing problem in a binary image. The state-of-the-art Euler number computing algorithms are reviewed, which obtain the Euler number through different techniques, such as definition, features of binary images, and special data structures representing forms of binary images, and we explain the main principles and strategies of the algorithms in detail. Afterwards, we present the experimental results to bring order of the prevailing Euler number computing algorithms in 8-connectivity cases. Then, we discuss both the parallel implementation and the hardware implementation of algorithms for calculating the Euler number and present the algorithm extension for 3D image Euler number computation. Lastly, we aim to outline forthcoming efforts concerning the computation of the Euler number. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

20 pages, 1410 KiB  
Article
Hypergraph Learning-Based Semi-Supervised Multi-View Spectral Clustering
by Geng Yang, Qin Li, Yu Yun, Yu Lei and Jane You
Electronics 2023, 12(19), 4083; https://doi.org/10.3390/electronics12194083 - 29 Sep 2023
Viewed by 714
Abstract
Graph-based semi-supervised multi-view clustering has demonstrated promising performance and gained significant attention due to its capability to handle sample spaces with arbitrary shapes. Nevertheless, the ordinary graph employed by most existing semi-supervised multi-view clustering methods only captures the pairwise relationships between samples, and [...] Read more.
Graph-based semi-supervised multi-view clustering has demonstrated promising performance and gained significant attention due to its capability to handle sample spaces with arbitrary shapes. Nevertheless, the ordinary graph employed by most existing semi-supervised multi-view clustering methods only captures the pairwise relationships between samples, and cannot fully explore the higher-order information and complex structure among multiple sample points. Additionally, most existing methods do not make full use of the complementary information and spatial structure contained in multi-view data, which is crucial to clustering results. We propose a novel hypergraph learning-based semi-supervised multi-view spectral clustering approach to overcome these limitations. Specifically, the proposed method fully considers the relationship between multiple sample points and utilizes hypergraph-induced hyper-Laplacian matrices to preserve the high-order geometrical structure in data. Based on the principle of complementarity and consistency between views, this method simultaneously learns indicator matrices of all views and harnesses the tensor Schatten p-norm to extract both complementary information and low-rank spatial structure within these views. Furthermore, we introduce an auto-weighted strategy to address the discrepancy between singular values, enhancing the robustness and stability of the algorithm. Detailed experimental results on various datasets demonstrate that our approach surpasses existing state-of-the-art semi-supervised multi-view clustering methods. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

11 pages, 1516 KiB  
Article
Creating Digital Watermarks in Bitmap Images Using Lagrange Interpolation and Bezier Curves
by Aigerim Yerimbetova, Elmira Daiyrbayeva, Ekaterina Merzlyakova, Andrey Fionov, Nazerke Baisholan, Mussa Turdalyuly, Nurzhan Mukazhanov and Almas Turganbayev
J. Imaging 2023, 9(10), 206; https://doi.org/10.3390/jimaging9100206 - 29 Sep 2023
Viewed by 1086
Abstract
The article is devoted to the introduction of digital watermarks, which formthe basis for copyright protection systems. Methods in this area are aimed at embedding hidden markers that are resistant to various container transformations. This paper proposes a method for embedding a digital [...] Read more.
The article is devoted to the introduction of digital watermarks, which formthe basis for copyright protection systems. Methods in this area are aimed at embedding hidden markers that are resistant to various container transformations. This paper proposes a method for embedding a digital watermark into bitmap images using Lagrange interpolation and the Bezier curve formula for five points, called Lagrange interpolation along the Bezier curve 5 (LIBC5). As a means of steganalysis, the RS method was used, which uses a sensitive method of double statistics obtained on the basis of spatial correlations in images. The output value of the RS analysis is the estimated length of the message in the image under study. The stability of the developed LIBC5 method to the detection of message transmission by the RS method has been experimentally determined. The developed method proved to be resistant to RS analysis. A study of the LIBC5 method showed an improvement in quilting resistance compared to that of the INMI image embedding method, which also uses Lagrange interpolation. Thus, the LIBC5 stegosystem can be successfully used to protect confidential data and copyrights. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

14 pages, 6430 KiB  
Article
Noise-Robust Pulse Wave Estimation from Near-Infrared Face Video Images Using the Wiener Estimation Method
by Yuta Hino, Koichi Ashida, Keiko Ogawa-Ochiai and Norimichi Tsumura
J. Imaging 2023, 9(10), 202; https://doi.org/10.3390/jimaging9100202 - 28 Sep 2023
Viewed by 1158
Abstract
In this paper, we propose a noise-robust pulse wave estimation method from near-infrared face video images. Pulse wave estimation in a near-infrared environment is expected to be applied to non-contact monitoring in dark areas. The conventional method cannot consider noise when performing estimation. [...] Read more.
In this paper, we propose a noise-robust pulse wave estimation method from near-infrared face video images. Pulse wave estimation in a near-infrared environment is expected to be applied to non-contact monitoring in dark areas. The conventional method cannot consider noise when performing estimation. As a result, the accuracy of pulse wave estimation in noisy environments is not very high. This may adversely affect the accuracy of heart rate data and other data obtained from pulse wave signals. Therefore, the objective of this study is to perform pulse wave estimation robust to noise. The Wiener estimation method, which is a simple linear computation that can consider noise, was used in this study. Experimental results showed that the combination of the proposed method and signal processing (detrending and bandpass filtering) increased the SNR (signal to noise ratio) by more than 2.5 dB compared to the conventional method and signal processing. The correlation coefficient between the pulse wave signal measured using a pulse wave meter and the estimated pulse wave signal was 0.30 larger on average for the proposed method. Furthermore, the AER (absolute error rate) between the heart rate measured with the pulse wave meter was 0.82% on average for the proposed method, which was lower than the value of the conventional method (12.53% on average). These results show that the proposed method is more robust to noise than the conventional method for pulse wave estimation. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

28 pages, 2717 KiB  
Article
Automatic Jordanian License Plate Detection and Recognition System Using Deep Learning Techniques
by Tharaa Aqaileh and Faisal Alkhateeb
J. Imaging 2023, 9(10), 201; https://doi.org/10.3390/jimaging9100201 - 28 Sep 2023
Viewed by 1899
Abstract
Recently, the number of vehicles on the road, especially in urban centres, has increased dramatically due to the increasing trend of individuals towards urbanisation. As a result, manual detection and recognition of vehicles (i.e., license plates and vehicle manufacturers) become an arduous task [...] Read more.
Recently, the number of vehicles on the road, especially in urban centres, has increased dramatically due to the increasing trend of individuals towards urbanisation. As a result, manual detection and recognition of vehicles (i.e., license plates and vehicle manufacturers) become an arduous task and beyond human capabilities. In this paper, we have developed a system using transfer learning-based deep learning (DL) techniques to identify Jordanian vehicles automatically. The YOLOv3 (You Only Look Once) model was re-trained using transfer learning to accomplish license plate detection, character recognition, and vehicle logo detection. In contrast, the VGG16 (Visual Geometry Group) model was re-trained to accomplish the vehicle logo recognition. To train and test these models, four datasets have been collected. The first dataset consists of 7035 Jordanian vehicle images, the second dataset consists of 7176 Jordanian license plates, and the third dataset consists of 8271 Jordanian vehicle images. These datasets have been used to train and test the YOLOv3 model for Jordanian license plate detection, character recognition, and vehicle logo detection. In comparison, the fourth dataset consists of 158,230 vehicle logo images used to train and test the VGG16 model for vehicle logo recognition. Text measures were used to evaluate the performance of our developed system. Moreover, the mean average precision (mAP) measure was used to assess the YOLOv3 model of the detection tasks (i.e., license plate detection and vehicle logo detection). For license plate detection, the precision, recall, F-measure, and mAP were 99.6%, 100%, 99.8%, and 99.9%, respectively. While for character recognition, the precision, recall, and F-measure were 100%, 99.9%, and 99.95%, respectively. The performance of the license plate recognition stage was evaluated by evaluating these two sub-stages as a sequence, where the precision, recall, and F-measure were 99.8%, 99.8%, and 99.8%, respectively. Furthermore, for vehicle logo detection, the precision, recall, F-measure, and mAP were 99%, 99.6%, 99.3%, and 99.1%, respectively, while for vehicle logo recognition, the precision, recall, and F-measure were 98%, 98%, and 98%, respectively. The performance of the vehicle logo recognition stage was evaluated by evaluating these two sub-stages as a sequence, where the precision, recall, and F-measure were 95.3%, 99.5%, and 97.4%, respectively. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

21 pages, 15819 KiB  
Article
Few-Shot Object Detection with Local Feature Enhancement and Feature Interrelation
by Hefeng Lai and Peng Zhang
Electronics 2023, 12(19), 4036; https://doi.org/10.3390/electronics12194036 - 25 Sep 2023
Viewed by 1110
Abstract
Few-shot object detection (FSOD) aims at designing models that can accurately detect targets of novel classes in a scarce data regime. Existing research has improved detection performance with meta-learning-based models. However, existing methods continue to exhibit certain imperfections: (1) Only the interacting global [...] Read more.
Few-shot object detection (FSOD) aims at designing models that can accurately detect targets of novel classes in a scarce data regime. Existing research has improved detection performance with meta-learning-based models. However, existing methods continue to exhibit certain imperfections: (1) Only the interacting global features of query and support images lead to ignoring local critical features in the imprecise localization of objects from new categories. (2) Convolutional neural networks (CNNs) encounter difficulty in learning diverse pose features from exceedingly limited labeled samples of unseen classes. (3) Local context information is not fully utilized in a global attention mechanism, which means the attention modules need to be improved. As a result, the detection performance of novel-class objects is compromised. To overcome these challenges, a few-shot object detection network is proposed with a local feature enhancement module and an intrinsic feature transformation module. In this paper, a local feature enhancement module (LFEM) is designed to raise the importance of intrinsic features of the novel-class samples. In addition, an Intrinsic Feature Transform Module (IFTM) is explored to enhance the feature representation of novel-class samples, which enriches the feature space of novel classes. Finally, a more effective cross-attention module, called Global Cross-Attention Network (GCAN), which fully aggregates local and global context information between query and support images, is proposed in this paper. The crucial features of novel-class objects are extracted effectively by our model before the feature fusion between query images and support images. Our proposed method increases, on average, the detection performance by 0.93 (nAP) in comparison with previous models on the PASCAL VOC FSOD benchmark dataset. Extensive experiments demonstrate the effectiveness of our modules under various experimental settings. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

18 pages, 5801 KiB  
Article
Attention-Mechanism-Based Models for Unconstrained Face Recognition with Mask Occlusion
by Mengya Zhang, Yuan Zhang and Qinghui Zhang
Electronics 2023, 12(18), 3916; https://doi.org/10.3390/electronics12183916 - 17 Sep 2023
Cited by 1 | Viewed by 845
Abstract
Masks cover most areas of the face, resulting in a serious loss of facial identity information; thus, how to alleviate or eliminate the negative impact of occlusion is a significant problem in the field of unconstrained face recognition. Inspired by the successful application [...] Read more.
Masks cover most areas of the face, resulting in a serious loss of facial identity information; thus, how to alleviate or eliminate the negative impact of occlusion is a significant problem in the field of unconstrained face recognition. Inspired by the successful application of attention mechanisms and capsule networks in computer vision, we propose ECA-Inception-Resnet-Caps, which is a novel framework based on Inception-Resnet-v1 for learning discriminative face features in unconstrained mask-wearing conditions. Firstly, Squeeze-and-Excitation (SE) modules and Efficient Channel Attention (ECA) modules are applied to Inception-Resnet-v1 to increase the attention on unoccluded face areas, which is used to eliminate the negative impact of occlusion during feature extraction. Secondly, the effects of the two attention mechanisms on the different modules in Inception-Resnet-v1 are compared and analyzed, which is the foundation for further constructing the ECA-Inception-Resnet-Caps framework. Finally, ECA-Inception-Resnet-Caps is obtained by improving Inception-Resnet-v1 with capsule modules, which is explored to increase the interpretability and generalization of the model after reducing the negative impact of occlusion. The experimental results demonstrate that both attention mechanisms and the capsule network can effectively enhance the performance of Inception-Resnet-v1 for face recognition in occlusion tasks, with the ECA-Inception-Resnet-Caps model being the most effective, achieving an accuracy of 94.32%, which is 1.42% better than the baseline model. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

14 pages, 2828 KiB  
Article
Efficient Dehazing with Recursive Gated Convolution in U-Net: A Novel Approach for Image Dehazing
by Zhibo Wang, Jia Jia, Peng Lyu and Jeongik Min
J. Imaging 2023, 9(9), 183; https://doi.org/10.3390/jimaging9090183 - 11 Sep 2023
Cited by 1 | Viewed by 1281
Abstract
Image dehazing, a fundamental problem in computer vision, involves the recovery of clear visual cues from images marred by haze. Over recent years, deploying deep learning paradigms has spurred significant strides in image dehazing tasks. However, many dehazing networks aim to enhance performance [...] Read more.
Image dehazing, a fundamental problem in computer vision, involves the recovery of clear visual cues from images marred by haze. Over recent years, deploying deep learning paradigms has spurred significant strides in image dehazing tasks. However, many dehazing networks aim to enhance performance by adopting intricate network architectures, complicating training, inference, and deployment procedures. This study proposes an end-to-end U-Net dehazing network model with recursive gated convolution and attention mechanisms to improve performance while maintaining a lean network structure. In our approach, we leverage an improved recursive gated convolution mechanism to substitute the original U-Net’s convolution blocks with residual blocks and apply the SK fusion module to revamp the skip connection method. We designate this novel U-Net variant as the Dehaze Recursive Gated U-Net (DRGNet). Comprehensive testing across public datasets demonstrates the DRGNet’s superior performance in dehazing quality, detail retrieval, and objective evaluation metrics. Ablation studies further confirm the effectiveness of the key design elements. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

17 pages, 7398 KiB  
Article
Feature Point Identification in Fillet Weld Joints Using an Improved CPDA Method
by Yang Huang, Shaolei Xu, Xingyu Gao, Chuannen Wei, Yang Zhang and Mingfeng Li
Appl. Sci. 2023, 13(18), 10108; https://doi.org/10.3390/app131810108 - 07 Sep 2023
Viewed by 684
Abstract
An intelligent, vision-guided welding robot is highly desired in machinery manufacturing, the ship industry, and vehicle engineering. The performance of the system greatly depends on the effective identification of weld seam features and the three-dimensional (3D) reconstruction of the weld seam position in [...] Read more.
An intelligent, vision-guided welding robot is highly desired in machinery manufacturing, the ship industry, and vehicle engineering. The performance of the system greatly depends on the effective identification of weld seam features and the three-dimensional (3D) reconstruction of the weld seam position in a complex industrial environment. In this paper, a 3D visual sensing system with a structured laser projector and CCD camera is developed to obtain the geometry information of fillet weld seams in robot welding. By accounting for the inclination characteristics of the laser stripe in fillet welding, a Gaussian-weighted PCA-based laser center line extraction method is proposed. Smoother laser centerlines can be obtained at large, inclined angles. Furthermore, an improved chord-to-point distance accumulation (CPDA) method with polygon approximation is proposed to identify the feature corner location in center line images. The proposed method is validated numerically with simulated piece-wise linear laser stripes and experimentally with automated robot welding. By comparing this method with the grayscale gravity method, Hessian-matrix-based method, and conventional CPDA method, the proposed improved CPDA method with PCA center extraction is shown to have high accuracy and robustness in noisy welding environments. The proposed method meets the need for vision-aided automated welding robots by achieving greater than 95% accuracy in corner feature point identification in fillet welding. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

20 pages, 4961 KiB  
Review
Research Progress on the Aesthetic Quality Assessment of Complex Layout Images Based on Deep Learning
by Yumei Pu, Danfei Liu, Siyuan Chen and Yunfei Zhong
Appl. Sci. 2023, 13(17), 9763; https://doi.org/10.3390/app13179763 - 29 Aug 2023
Viewed by 1090
Abstract
With the development of the information age, the layout image is no longer a simple combination of text and graphics, but covers the complex layout image obtained from text, graphics, images and other layout elements through the process of artistic design, pre-press processing, [...] Read more.
With the development of the information age, the layout image is no longer a simple combination of text and graphics, but covers the complex layout image obtained from text, graphics, images and other layout elements through the process of artistic design, pre-press processing, typesetting, and so on. At present, the field of aesthetic-quality assessment mainly focuses on photographic images, and the aesthetic-quality assessment of complex layout images is rarely reported. However, the design of complex layout images such as posters, packaging labels, advertisements, etc., cannot be separated from the evaluation of aesthetic quality. In this paper, layout analysis is performed on complex layout images. Traditional and deep-learning-based methods for image layout analysis and aesthetic-quality assessment are reviewed and analyzed. Finally, the features, advantages and applications of common image aesthetic-quality assessment datasets and layout analysis datasets are compared and analyzed. Limitations and future perspectives of aesthetic assessment of complex layout images are discussed in relation to layout analysis and aesthetic characteristics. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

15 pages, 13578 KiB  
Article
PP-JPEG: A Privacy-Preserving JPEG Image-Tampering Localization
by Riyanka Jena, Priyanka Singh and Manoranjan Mohanty
J. Imaging 2023, 9(9), 172; https://doi.org/10.3390/jimaging9090172 - 27 Aug 2023
Viewed by 1209
Abstract
The widespread availability of digital image-processing software has given rise to various forms of image manipulation and forgery, which can pose a significant challenge in different fields, such as law enforcement, journalism, etc. It can also lead to privacy concerns. We are proposing [...] Read more.
The widespread availability of digital image-processing software has given rise to various forms of image manipulation and forgery, which can pose a significant challenge in different fields, such as law enforcement, journalism, etc. It can also lead to privacy concerns. We are proposing that a privacy-preserving framework to encrypt images before processing them is vital to maintain the privacy and confidentiality of sensitive images, especially those used for the purpose of investigation. To address these challenges, we propose a novel solution that detects image forgeries while preserving the privacy of the images. Our method proposes a privacy-preserving framework that encrypts the images before processing them, making it difficult for unauthorized individuals to access them. The proposed method utilizes a compression quality analysis in the encrypted domain to detect the presence of forgeries in images by determining if the forged portion (dummy image) has a compression quality different from that of the original image (featured image) in the encrypted domain. This approach effectively localizes the tampered portions of the image, even for small pixel blocks of size 10×10 in the encrypted domain. Furthermore, the method identifies the featured image’s JPEG quality using the first minima in the energy graph. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

12 pages, 4182 KiB  
Article
Self-Supervised Sound Promotion Method of Sound Localization from Video
by Yang Li, Xiaoli Zhao and Zhuoyao Zhang
Electronics 2023, 12(17), 3558; https://doi.org/10.3390/electronics12173558 - 23 Aug 2023
Viewed by 587
Abstract
Compared to traditional unimodal methods, multimodal audio-visual correspondence learning has many advantages in the field of video understanding, but it also faces significant challenges. In order to fully utilize the feature information from both modalities, we needs to ensure accurate alignment of the [...] Read more.
Compared to traditional unimodal methods, multimodal audio-visual correspondence learning has many advantages in the field of video understanding, but it also faces significant challenges. In order to fully utilize the feature information from both modalities, we needs to ensure accurate alignment of the semantic information from each modality, rather than simply concatenating them together. This requires consideration of how to design fusion networks that can better perform this task. Current algorithms heavily rely on the network’s output results for sound-object localization while neglecting the potential issue of suppressed feature information due to the internal structure of the network. Thus, we propose a sound promotion method (SPM), a self-supervised framework that aims to increase the contribution of voices to produce better performance of the audiovisual learning. We first cluster the audio separately to generate pseudo-labels and then use the clusters to train the backbone of audio. Finally, we explore the impact of our method to several existing approaches on MUSIC datasets and the results prove that our proposed method is able to produce better performance. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

15 pages, 645 KiB  
Article
Thangka Image Captioning Based on Semantic Concept Prompt and Multimodal Feature Optimization
by Wenjin Hu, Lang Qiao, Wendong Kang and Xinyue Shi
J. Imaging 2023, 9(8), 162; https://doi.org/10.3390/jimaging9080162 - 16 Aug 2023
Viewed by 1110
Abstract
Thangka images exhibit a high level of diversity and richness, and the existing deep learning-based image captioning methods generate poor accuracy and richness of Chinese captions for Thangka images. To address this issue, this paper proposes a Semantic Concept Prompt and Multimodal Feature [...] Read more.
Thangka images exhibit a high level of diversity and richness, and the existing deep learning-based image captioning methods generate poor accuracy and richness of Chinese captions for Thangka images. To address this issue, this paper proposes a Semantic Concept Prompt and Multimodal Feature Optimization network (SCAMF-Net). The Semantic Concept Prompt (SCP) module is introduced in the text encoding stage to obtain more semantic information about the Thangka by introducing contextual prompts, thus enhancing the richness of the description content. The Multimodal Feature Optimization (MFO) module is proposed to optimize the correlation between Thangka images and text. This module enhances the correlation between the image features and text features of the Thangka through the Captioner and Filter to more accurately describe the visual concept features of the Thangka. The experimental results demonstrate that our proposed method outperforms baseline models on the Thangka dataset in terms of BLEU-4, METEOR, ROUGE, CIDEr, and SPICE by 8.7%, 7.9%, 8.2%, 76.6%, and 5.7%, respectively. Furthermore, this method also exhibits superior performance compared to the state-of-the-art methods on the public MSCOCO dataset. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

21 pages, 2476 KiB  
Article
Eff-PCNet: An Efficient Pure CNN Network for Medical Image Classification
by Wenwen Yue, Shiwei Liu and Yongming Li
Appl. Sci. 2023, 13(16), 9226; https://doi.org/10.3390/app13169226 - 14 Aug 2023
Cited by 2 | Viewed by 1124
Abstract
With the development of deep learning, convolutional neural networks (CNNs) and Transformer-based methods have become key techniques for medical image classification tasks. However, many current neural network models have problems such as high complexity, a large number of parameters, and large model sizes; [...] Read more.
With the development of deep learning, convolutional neural networks (CNNs) and Transformer-based methods have become key techniques for medical image classification tasks. However, many current neural network models have problems such as high complexity, a large number of parameters, and large model sizes; such models obtain higher classification accuracy at the expense of lightweight networks. Moreover, such larger-scale models pose a great challenge for practical clinical applications. Meanwhile, Transformer and multi-layer perceptron (MLP) methods have some shortcomings in terms of local modeling capability and high model complexity, and need to be used on larger datasets to show good performance. This makes it difficult to utilize these networks in clinical medicine. Based on this, we propose a lightweight and efficient pure CNN network for medical image classification (Eff-PCNet). On the one hand, we propose a multi-branch multi-scale CNN (M2C) module, which divides the feature map into four parallel branches along the channel dimensions by a certain scale factor and carries out a deep convolution operation using different scale convolution kernels, and this multi-branch multi-scale operation effectively replaces the large kernel convolution. This multi-branch multi-scale operation effectively replaces the large kernel convolution. It reduces the computational cost of the module while fusing the feature information between different channels and thus obtains richer feature information. Finally, the four feature maps are then spliced along the channel dimensions to fuse the multi-scale and multi-dimensional feature information. On the other hand, we introduce the structural reparameterization technique and propose the structural reparameterized CNN (Rep-C) module. Specifically, it utilizes multiple linear operators to generate different feature maps during the training process and fuses all the participants into one through parameter fusion to achieve fast inference while providing a more effective solution for feature reuse. A number of experimental results show that our Eff-PCNet performs better than current methods based on CNN, Transformer, and MLP in the classification of three publicly available medical image datasets. Among them, we achieve 87.4% Acc on the HAM10000 dataset, 91.06% Acc on the SkinCancer dataset, and 97.03% Acc on the Chest-Xray dataset. Meanwhile, our approach achieves a better trade-off between the number of parameters; computation; and other performance metrics as well. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

15 pages, 12839 KiB  
Article
Improved Traffic Small Object Detection via Cross-Layer Feature Fusion and Channel Attention
by Qinliang Chuai, Xiaowei He and Yi Li
Electronics 2023, 12(16), 3421; https://doi.org/10.3390/electronics12163421 - 12 Aug 2023
Viewed by 900
Abstract
Small object detection has long been one of the most formidable challenges in computer vision due to the poor visual features and high noise of surroundings behind them. However, small targets in traffic scenes encompass a multitude of complex visual interfering factors, bearing [...] Read more.
Small object detection has long been one of the most formidable challenges in computer vision due to the poor visual features and high noise of surroundings behind them. However, small targets in traffic scenes encompass a multitude of complex visual interfering factors, bearing crucial information such as traffic signs, traffic lights, and pedestrians. Given the inherent difficulties faced by generic models in addressing these issues, we conduct a comprehensive investigation on small target detection in this application scenario. In this work, we present a Cross-Layer Feature Fusion and Channel Attention algorithm based on a lightweight YOLOv5s design for traffic small target detection, named CFA-YOLO. To enhance the sensitivity of the model toward vital features, we embed the channel-guided Squeeze-and-Excitation (SE) block in the deep layer of the backbone. Moreover, the most excellent innovation of our work belongs to the effective cross-layer feature fusion method, which maintains robust feature fusion and information interaction capabilities; in addition, it simplifies redundant parameters compared with the baseline model. To align with the output features of the neck network, we adjusted the detection heads from three to two. Furthermore, we also applied the decoupled detection head for classification and bounding box regression tasks, respectively. This approach not only achieves real-time detection standards, but also improves the overall training results in parameter-friendly manner. The CFA-YOLO model significantly pays a lot of attention to the detail features of small targets, thereby it also has a great advantage in addressing the issue of poor performance in traffic small target detection results. Vast experiments have validated the efficiency and effectiveness of our proposed method in traffic small object detection. Compared with the latest lightweight detectors, such as YOLOv7-Tiny and YOLOv8s, our method consistently achieves superior performance both in terms of the model’s accuracy and complexity. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

20 pages, 10776 KiB  
Article
Vision-Based System for Black Rubber Roller Surface Inspection
by Thanh-Hung Nguyen, Huu-Long Nguyen, Ngoc-Tam Bui, Trung-Hieu Bui, Van-Ban Vu, Hoai-Nam Duong and Hong-Hai Hoang
Appl. Sci. 2023, 13(15), 8999; https://doi.org/10.3390/app13158999 - 06 Aug 2023
Cited by 1 | Viewed by 1287
Abstract
This paper proposes a machine vision system for the surface inspection of black rubber rollers in manufacturing processes. The system aims to enhance the surface quality of the rollers by detecting and classifying defects. A lighting system is installed to highlight surface defects. [...] Read more.
This paper proposes a machine vision system for the surface inspection of black rubber rollers in manufacturing processes. The system aims to enhance the surface quality of the rollers by detecting and classifying defects. A lighting system is installed to highlight surface defects. Two algorithms are proposed for defect detection: a traditional-based method and a deep learning-based method. The former is fast but limited to surface defect detection, while the latter is slower but capable of detecting and classifying defects. The accuracy of the algorithms is verified through experiments, with the traditional-based method achieving near-perfect accuracy of approximately 98% for defect detection, and the deep learning-based method achieving an accuracy of approximately 95.2% for defect detection and 96% for defect classification. The proposed machine vision system can significantly improve the surface inspection of black rubber rollers, thereby ensuring high-quality production. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

23 pages, 3912 KiB  
Article
Image Sampling Based on Dominant Color Component for Computer Vision
by Saisai Wang, Jiashuai Cui, Fan Li and Liejun Wang
Electronics 2023, 12(15), 3360; https://doi.org/10.3390/electronics12153360 - 06 Aug 2023
Viewed by 958
Abstract
Image sampling is a fundamental technique for image compression, which greatly improves the efficiency of image storage, transmission, and applications. However, existing sampling algorithms primarily consider human visual perception and discard irrelevant information based on subjective preferences. Unfortunately, these methods may not adequately [...] Read more.
Image sampling is a fundamental technique for image compression, which greatly improves the efficiency of image storage, transmission, and applications. However, existing sampling algorithms primarily consider human visual perception and discard irrelevant information based on subjective preferences. Unfortunately, these methods may not adequately meet the demands of computer vision tasks and can even lead to redundancy because of the different preferences between human and computer. To tackle this issue, this paper investigates the key features of computer vision. Based on our findings, we propose an image sampling method based on the dominant color component (ISDCC). In this method, we utilize a grayscale image to preserve the essential structural information for computer vision. Then, we construct a concise color feature map based on the dominant channel of pixels. This approach provides relevant color information for computer vision tasks. We conducted experimental evaluations using well-known benchmark datasets. The results demonstrate that ISDCC adapts effectively to computer vision requirements, significantly reducing the amount of data needed. Furthermore, our method has a minimal impact on the performance of mainstream computer vision algorithms across various tasks. Compared to other sampling approaches, our proposed method exhibits clear advantages by achieving superior results with less data usage. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

21 pages, 3618 KiB  
Review
Pseudo Labels for Unsupervised Domain Adaptation: A Review
by Yundong Li, Longxia Guo and Yizheng Ge
Electronics 2023, 12(15), 3325; https://doi.org/10.3390/electronics12153325 - 03 Aug 2023
Cited by 2 | Viewed by 2164
Abstract
Conventional machine learning relies on two presumptions: (1) the training and testing datasets follow the same independent distribution, and (2) an adequate quantity of samples is essential for achieving optimal model performance during training. Nevertheless, meeting these two assumptions can be challenging in [...] Read more.
Conventional machine learning relies on two presumptions: (1) the training and testing datasets follow the same independent distribution, and (2) an adequate quantity of samples is essential for achieving optimal model performance during training. Nevertheless, meeting these two assumptions can be challenging in real-world scenarios. Domain adaptation (DA) is a subfield of transfer learning that focuses on reducing the distribution difference between the source domain (Ds) and target domain (Dt) and subsequently applying the knowledge gained from the Ds task to the Dt task. The majority of current DA methods aim to achieve domain invariance by aligning the marginal probability distributions of the Ds. and Dt. Recent studies have pointed out that aligning marginal probability distributions alone is not sufficient and that alignment of conditional probability distributions is equally important for knowledge migration. Nonetheless, unsupervised DA presents a more significant difficulty in aligning the conditional probability distributions because of the unavailability of labels for the Dt. In response to this issue, there have been several proposed methods by researchers, including pseudo-labeling, which offer novel solutions to tackle the problem. In this paper, we systematically analyze various pseudo-labeling algorithms and their applications in unsupervised DA. First , we summarize the pseudo-label generation methods based on the single and multiple classifiers and actions taken to deal with the problem of imbalanced samples. Second, we investigate the application of pseudo-labeling in category feature alignment and improving feature discrimination. Finally, we point out the challenges and trends of pseudo-labeling algorithms. As far as we know, this article is the initial review of pseudo-labeling techniques for unsupervised DA. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

32 pages, 14809 KiB  
Article
A Comparison of Monocular Visual SLAM and Visual Odometry Methods Applied to 3D Reconstruction
by Erick P. Herrera-Granda, Juan C. Torres-Cantero, Andrés Rosales and Diego H. Peluffo-Ordóñez
Appl. Sci. 2023, 13(15), 8837; https://doi.org/10.3390/app13158837 - 31 Jul 2023
Cited by 1 | Viewed by 2476
Abstract
Pure monocular 3D reconstruction is a complex problem that has attracted the research community’s interest due to the affordability and availability of RGB sensors. SLAM, VO, and SFM are disciplines formulated to solve the 3D reconstruction problem and estimate the camera’s ego-motion; so, [...] Read more.
Pure monocular 3D reconstruction is a complex problem that has attracted the research community’s interest due to the affordability and availability of RGB sensors. SLAM, VO, and SFM are disciplines formulated to solve the 3D reconstruction problem and estimate the camera’s ego-motion; so, many methods have been proposed. However, most of these methods have not been evaluated on large datasets and under various motion patterns, have not been tested under the same metrics, and most of them have not been evaluated following a taxonomy, making their comparison and selection difficult. In this research, we performed a comparison of ten publicly available SLAM and VO methods following a taxonomy, including one method for each category of the primary taxonomy, three machine-learning-based methods, and two updates of the best methods to identify the advantages and limitations of each category of the taxonomy and test whether the addition of machine learning or updates on those methods improved them significantly. Thus, we evaluated each algorithm using the TUM-Mono dataset and benchmark, and we performed an inferential statistical analysis to identify the significant differences through its metrics. The results determined that the sparse-direct methods significantly outperformed the rest of the taxonomy, and fusing them with machine learning techniques significantly enhanced the geometric-based methods’ performance from different perspectives. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

22 pages, 5939 KiB  
Article
Risevi: A Disease Risk Prediction Model Based on Vision Transformer Applied to Nursing Homes
by Feng Zhou, Shijing Hu, Xiaoli Wan, Zhihui Lu and Jie Wu
Electronics 2023, 12(15), 3206; https://doi.org/10.3390/electronics12153206 - 25 Jul 2023
Cited by 1 | Viewed by 1085
Abstract
The intensification of population aging has brought pressure on public medical care. In order to reduce this pressure, we combined the image classification method with computer vision and used audio data that is easy to collect in nursing homes. Based on MelGAN, transfer [...] Read more.
The intensification of population aging has brought pressure on public medical care. In order to reduce this pressure, we combined the image classification method with computer vision and used audio data that is easy to collect in nursing homes. Based on MelGAN, transfer learning, and Vision Transformer, we propose an application called Risevi (A Disease Risk Prediction Model Based on Vision Transformer), a disease risk prediction model for nursing homes. We first design a sample generation method based on MelGAN, then refer to the Mel frequency cepstral coefficient and the Wav2vec2 model to design the sample feature extraction method, perform floating-point operations on the tensor of the extracted features, and then convert it into a waveform. We then design a sample feature classification method based on transfer learning and Vision Transformer. Finally, we obtain the Risevi model. In this paper, we use public datasets and subject data as sample data. The experimental results show that the Risevi model has achieved an accuracy rate of 98.5%, a precision rate of 96.38%, a recall rate of 98.17%, and an F1 score of 97.15%. The experimental results show that the Risevi model can provide practical support for reducing public medical pressure. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

25 pages, 30967 KiB  
Article
A Decoupled Semantic–Detail Learning Network for Remote Sensing Object Detection in Complex Backgrounds
by Hao Ruan, Wenbin Qian, Zhihong Zheng and Yingqiong Peng
Electronics 2023, 12(14), 3201; https://doi.org/10.3390/electronics12143201 - 24 Jul 2023
Cited by 1 | Viewed by 973
Abstract
Detecting multi-scale objects in complex backgrounds is a crucial challenge in remote sensing. The main challenge is that the localization and identification of objects in complex backgrounds can be inaccurate. To address this issue, a decoupled semantic–detail learning network (DSDL-Net) was proposed. Our [...] Read more.
Detecting multi-scale objects in complex backgrounds is a crucial challenge in remote sensing. The main challenge is that the localization and identification of objects in complex backgrounds can be inaccurate. To address this issue, a decoupled semantic–detail learning network (DSDL-Net) was proposed. Our proposed approach comprises two components. Firstly, we introduce a multi-receptive field feature fusion and detail mining (MRF-DM) module, which learns higher semantic-level representations by fusing multi-scale receptive fields. Subsequently, it uses multi-scale pooling to preserve detail texture information at different scales. Secondly, we present an adaptive cross-level semantic–detail fusion (CSDF) network that leverages a feature pyramid with fusion between detailed features extracted from the backbone network and high-level semantic features obtained from the topmost layer of the pyramid. The fusion is accomplished through two rounds of parallel global–local contextual feature extraction, with shared learning for global context information between the two rounds. Furthermore, to effectively enhance fine-grained texture features conducive to object localization and features conducive to object semantic recognition, we adopt and improve two enhancement modules with attention mechanisms, making them simpler and more lightweight. Our experimental results demonstrate that our approach outperforms 12 benchmark models on three publicly available remote sensing datasets (DIOR, HRRSD, and RSOD) regarding average precision (AP) at small, medium, and large scales. On the DIOR dataset, our model achieved a 2.19% improvement in mAP@0.5 compared to the baseline model, with a parameter reduction of 14.07%. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

19 pages, 5584 KiB  
Article
FURSformer: Semantic Segmentation Network for Remote Sensing Images with Fused Heterogeneous Features
by Zehua Zhang, Bailin Liu and Yani Li
Electronics 2023, 12(14), 3113; https://doi.org/10.3390/electronics12143113 - 18 Jul 2023
Cited by 1 | Viewed by 827
Abstract
Semantic segmentation of remote sensing images poses a formidable challenge within this domain. Our investigation commences with a pilot study aimed at scrutinizing the advantages and disadvantages of employing a Transformer architecture and a CNN architecture in remote sensing imagery (RSI). Our objective [...] Read more.
Semantic segmentation of remote sensing images poses a formidable challenge within this domain. Our investigation commences with a pilot study aimed at scrutinizing the advantages and disadvantages of employing a Transformer architecture and a CNN architecture in remote sensing imagery (RSI). Our objective is to substantiate the indispensability of both local and global information for RSI analysis. In this research article, we harness the potential of the Transformer model to establish global contextual understanding while incorporating an additional convolution module for localized perception. Nonetheless, a direct fusion of these heterogeneous information sources often yields subpar outcomes. To address this limitation, we propose an innovative hierarchical fusion feature information module that this model can fuse Transformer and CNN features using an ensemble-to-set approach, thereby enhancing information compatibility. Our proposed model, named FURSformer, amalgamates the strengths of the Transformer architecture and CNN. The experimental results clearly demonstrate the effectiveness of this approach. Notably, our model achieved an outstanding accuracy of 90.78% mAccuracy on the DLRSD dataset. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

17 pages, 2651 KiB  
Article
D-UAP: Initially Diversified Universal Adversarial Patch Generation Method
by Lei Sun, Xiaoqin Wang, Youhuan Yang and Xiuqing Mao
Electronics 2023, 12(14), 3080; https://doi.org/10.3390/electronics12143080 - 14 Jul 2023
Viewed by 865
Abstract
With the rapid development of adversarial example technologies, the concept of adversarial patches has been proposed, which can successfully transfer adversarial attacks to the real world and fool intelligent object detection systems. However, the real-world environment is complex and changeable, and the adversarial [...] Read more.
With the rapid development of adversarial example technologies, the concept of adversarial patches has been proposed, which can successfully transfer adversarial attacks to the real world and fool intelligent object detection systems. However, the real-world environment is complex and changeable, and the adversarial patch attack technology is susceptible to real-world factors, resulting in a decrease in the success rate of attack. Existing adversarial-patch-generation algorithms have a single direction of patch initialization and do not fully consider the impact of initial diversification on its upper limit of adversarial patch attack. Therefore, this paper proposes an initial diversified adversarial patch generation technology to improve the effect of adversarial patch attacks on the underlying algorithms in the real world. The method uses YOLOv4 as the attack model, and the experimental results show that the attack effect of the adversarial-patch-attack method proposed in this paper is higher than the baseline 8.46%, and it also has a stronger attack effect and fewer training rounds. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

13 pages, 2622 KiB  
Article
A Super-Resolution Reconstruction Network of Space Target Images Based on Dual Regression and Deformable Convolutional Attention Mechanism
by Yan Shi, Chun Jiang, Changhua Liu, Wenhan Li and Zhiyong Wu
Electronics 2023, 12(13), 2995; https://doi.org/10.3390/electronics12132995 - 07 Jul 2023
Cited by 1 | Viewed by 641
Abstract
High-quality space target images are important for space surveillance and space attack defense confrontation. To obtain space target images with higher resolution and sharpness, this paper proposes an image super-resolution reconstruction network based on dual regression and a deformable convolutional attention mechanism (DCAM). [...] Read more.
High-quality space target images are important for space surveillance and space attack defense confrontation. To obtain space target images with higher resolution and sharpness, this paper proposes an image super-resolution reconstruction network based on dual regression and a deformable convolutional attention mechanism (DCAM). Firstly, the mapping space is constrained by dual regression; secondly, deformable convolution is used to expand the perceptual field and extract the high-frequency features of the image; finally, the convolutional attention mechanism is used to calculate the saliency of the channel domain and the spatial domain of the image to enhance the useful features and suppress the useless feature responses. The experimental results show that the method outperforms the comparison algorithm in both objective quality evaluation index and localization accuracy on the space target image dataset compared with the current mainstream image super-resolution algorithms. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

13 pages, 2070 KiB  
Article
Color Mura Defect Detection Method Based on Channel Contrast Sensitivity Function Filtering
by Zhixi Wang, Huaixin Chen, Wenqiang Xie and Haoyu Wang
Electronics 2023, 12(13), 2965; https://doi.org/10.3390/electronics12132965 - 05 Jul 2023
Cited by 1 | Viewed by 904
Abstract
To address the issue of low detection accuracy caused by low contrast in color Mura defects, this paper proposes a color Mura defect detection method based on channel contrast sensitivity function (CSF) filtering. The RGB image of the captured liquid crystal display (LCD) [...] Read more.
To address the issue of low detection accuracy caused by low contrast in color Mura defects, this paper proposes a color Mura defect detection method based on channel contrast sensitivity function (CSF) filtering. The RGB image of the captured liquid crystal display (LCD) display is converted to the Lab color space, and the Weber contrast feature maps of the Lab channel images are calculated. Frequency domain filtering is performed using the CSF to obtain visually sensitive Lab feature maps. Color Mura defect detection is achieved by employing adaptive segmentation thresholds based on the fused feature maps of the L channel and ab channels. The color Mura evaluation criterion is utilized to quantitatively assess the defect detection results. Experimental results demonstrate that the proposed method achieves an accuracy rate of 87% in color Mura defect detection, outperforming existing mainstream detection methods. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

15 pages, 8102 KiB  
Article
Ambiguity in Solving Imaging Inverse Problems with Deep-Learning-Based Operators
by Davide Evangelista, Elena Morotti, Elena Loli Piccolomini and James Nagy
J. Imaging 2023, 9(7), 133; https://doi.org/10.3390/jimaging9070133 - 30 Jun 2023
Cited by 1 | Viewed by 1318
Abstract
In recent years, large convolutional neural networks have been widely used as tools for image deblurring, because of their ability in restoring images very precisely. It is well known that image deblurring is mathematically modeled as an ill-posed inverse problem and its solution [...] Read more.
In recent years, large convolutional neural networks have been widely used as tools for image deblurring, because of their ability in restoring images very precisely. It is well known that image deblurring is mathematically modeled as an ill-posed inverse problem and its solution is difficult to approximate when noise affects the data. Really, one limitation of neural networks for deblurring is their sensitivity to noise and other perturbations, which can lead to instability and produce poor reconstructions. In addition, networks do not necessarily take into account the numerical formulation of the underlying imaging problem when trained end-to-end. In this paper, we propose some strategies to improve stability without losing too much accuracy to deblur images with deep-learning-based methods. First, we suggest a very small neural architecture, which reduces the execution time for training, satisfying a green AI need, and does not extremely amplify noise in the computed image. Second, we introduce a unified framework where a pre-processing step balances the lack of stability of the following neural-network-based step. Two different pre-processors are presented. The former implements a strong parameter-free denoiser, and the latter is a variational-model-based regularized formulation of the latent imaging problem. This framework is also formally characterized by mathematical analysis. Numerical experiments are performed to verify the accuracy and stability of the proposed approaches for image deblurring when unknown or not-quantified noise is present; the results confirm that they improve the network stability with respect to noise. In particular, the model-based framework represents the most reliable trade-off between visual precision and robustness. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

18 pages, 14748 KiB  
Article
A Joint De-Rain and De-Mist Network Based on the Atmospheric Scattering Model
by Linyun Gu, Huahu Xu and Xiaojin Ma
J. Imaging 2023, 9(7), 129; https://doi.org/10.3390/jimaging9070129 - 26 Jun 2023
Viewed by 1007
Abstract
Rain can have a detrimental effect on optical components, leading to the appearance of streaks and halos in images captured during rainy conditions. These visual distortions caused by rain and mist contribute significant noise information that can compromise image quality. In this paper, [...] Read more.
Rain can have a detrimental effect on optical components, leading to the appearance of streaks and halos in images captured during rainy conditions. These visual distortions caused by rain and mist contribute significant noise information that can compromise image quality. In this paper, we propose a novel approach for simultaneously removing both streaks and halos from the image to produce clear results. First, based on the principle of atmospheric scattering, a rain and mist model is proposed to initially remove the streaks and halos from the image by reconstructing the image. The Deep Memory Block (DMB) selectively extracts the rain layer transfer spectrum and the mist layer transfer spectrum from the rainy image to separate these layers. Then, the Multi-scale Convolution Block (MCB) receives the reconstructed images and extracts both structural and detailed features to enhance the overall accuracy and robustness of the model. Ultimately, extensive results demonstrate that our proposed model JDDN (Joint De-rain and De-mist Network) outperforms current state-of-the-art deep learning methods on synthetic datasets as well as real-world datasets, with an average improvement of 0.29 dB on the heavy-rainy-image dataset. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

12 pages, 1692 KiB  
Article
Quantifying the Displacement of Data Matrix Code Modules: A Comparative Study of Different Approximation Approaches for Predictive Maintenance of Drop-on-Demand Printing Systems
by Peter Bischoff, André V. Carreiro, Christiane Schuster and Thomas Härtling
J. Imaging 2023, 9(7), 125; https://doi.org/10.3390/jimaging9070125 - 21 Jun 2023
Viewed by 1203
Abstract
Drop-on-demand printing using colloidal or pigmented inks is prone to the clogging of printing nozzles, which can lead to positional deviations and inconsistently printed patterns (e.g., data matrix codes, DMCs). However, if such deviations are detected early, they can be useful for determining [...] Read more.
Drop-on-demand printing using colloidal or pigmented inks is prone to the clogging of printing nozzles, which can lead to positional deviations and inconsistently printed patterns (e.g., data matrix codes, DMCs). However, if such deviations are detected early, they can be useful for determining the state of the print head and planning maintenance operations prior to reaching a printing state where the printed DMCs are unreadable. To realize this predictive maintenance approach, it is necessary to accurately quantify the positional deviation of individually printed dots from the actual target position. Here, we present a comparison of different methods based on affinity transformations and clustering algorithms for calculating the target position from the printed positions and, subsequently, the deviation of both for complete DMCs. Hence, our method focuses on the evaluation of the print quality, not on the decoding of DMCs. We compare our results to a state-of-the-art decoding algorithm, adopted to return the target grid positions, and find that we can determine the occurring deviations with significantly higher accuracy, especially when the printed DMCs are of low quality. The results enable the development of decision systems for predictive maintenance and subsequently the optimization of printing systems. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

25 pages, 13042 KiB  
Article
A Novel Image Encryption Scheme Using Chaotic Maps and Fuzzy Numbers for Secure Transmission of Information
by Dani Elias Mfungo, Xianping Fu, Yongjin Xian and Xingyuan Wang
Appl. Sci. 2023, 13(12), 7113; https://doi.org/10.3390/app13127113 - 14 Jun 2023
Cited by 7 | Viewed by 2318
Abstract
The complexity of chaotic systems, if used in information encryption, can determine the status of security. The paper proposes a novel image encryption scheme that uses chaotic maps and fuzzy numbers for the secure transmission of information. The encryption method combines logistic and [...] Read more.
The complexity of chaotic systems, if used in information encryption, can determine the status of security. The paper proposes a novel image encryption scheme that uses chaotic maps and fuzzy numbers for the secure transmission of information. The encryption method combines logistic and sine maps to form the logistic sine map, as well as the fuzzy concept and the Hénon map to form the fuzzy Hénon map, in which these maps are used to generate secure secret keys, respectively. Additionally, a fuzzy triangular membership function is used to modify the initial conditions of the maps during the diffusion process. The encryption process involves scrambling the image pixels, summing adjacent row values, and XORing the result with randomly generated numbers from the chaotic maps. The proposed method is tested against various attacks, including statistical attack analysis, local entropy analysis, differential attack analysis, signal-to-noise ratio, signal-to-noise distortion ratio, mean error square, brute force attack analysis, and information entropy analysis, while the randomness number has been evaluated using the NIST test. This scheme also has a high key sensitivity, which means that a small change in the secret keys can result in a significant change in the encrypted image The results demonstrate the effectiveness of the proposed scheme in ensuring the secure transmission of information. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

13 pages, 9527 KiB  
Article
SimoSet: A 3D Object Detection Dataset Collected from Vehicle Hybrid Solid-State LiDAR
by Xinyu Sun, Lisheng Jin, Yang He, Huanhuan Wang, Zhen Huo and Yewei Shi
Electronics 2023, 12(11), 2424; https://doi.org/10.3390/electronics12112424 - 26 May 2023
Cited by 1 | Viewed by 1519
Abstract
Three-dimensional (3D) object detection based on point cloud data plays a critical role in the perception system of autonomous driving. However, this task presents a significant challenge in terms of its practical implementation due to the absence of point cloud data from automotive-grade [...] Read more.
Three-dimensional (3D) object detection based on point cloud data plays a critical role in the perception system of autonomous driving. However, this task presents a significant challenge in terms of its practical implementation due to the absence of point cloud data from automotive-grade hybrid solid-state LiDAR, as well as the limitations regarding the generalization ability of data-driven deep learning methods. In this paper, we introduce SimoSet, the first vehicle view 3D object detection dataset composed of automotive-grade hybrid solid-state LiDAR data. The dataset was collected from a university campus, contains 52 scenes, each of which are 8 s long, and provides three types of labels for typical traffic participants. We analyze the impact of the installation height and angle of the LiDAR on scanning effect and provide a reference process for the collection, annotation, and format conversion of LiDAR data. Finally, we provide baselines for LiDAR-only 3D object detection. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

17 pages, 2657 KiB  
Article
Combining CNNs and Markov-like Models for Facial Landmark Detection with Spatial Consistency Estimates
by Ahmed Gdoura, Markus Degünther, Birgit Lorenz and Alexander Effland
J. Imaging 2023, 9(5), 104; https://doi.org/10.3390/jimaging9050104 - 22 May 2023
Cited by 1 | Viewed by 1673
Abstract
The accurate localization of facial landmarks is essential for several tasks, including face recognition, head pose estimation, facial region extraction, and emotion detection. Although the number of required landmarks is task-specific, models are typically trained on all available landmarks in the datasets, limiting [...] Read more.
The accurate localization of facial landmarks is essential for several tasks, including face recognition, head pose estimation, facial region extraction, and emotion detection. Although the number of required landmarks is task-specific, models are typically trained on all available landmarks in the datasets, limiting efficiency. Furthermore, model performance is strongly influenced by scale-dependent local appearance information around landmarks and the global shape information generated by them. To account for this, we propose a lightweight hybrid model for facial landmark detection designed specifically for pupil region extraction. Our design combines a convolutional neural network (CNN) with a Markov random field (MRF)-like process trained on only 17 carefully selected landmarks. The advantage of our model is the ability to run different image scales on the same convolutional layers, resulting in a significant reduction in model size. In addition, we employ an approximation of the MRF that is run on a subset of landmarks to validate the spatial consistency of the generated shape. This validation process is performed against a learned conditional distribution, expressing the location of one landmark relative to its neighbor. Experimental results on popular facial landmark localization datasets such as 300 w, WFLW, and HELEN demonstrate the accuracy of our proposed model. Furthermore, our model achieves state-of-the-art performance on a well-defined robustness metric. In conclusion, the results demonstrate the ability of our lightweight model to filter out spatially inconsistent predictions, even with significantly fewer training landmarks. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

18 pages, 6962 KiB  
Article
A Hierarchical Clustering Obstacle Detection Method Applied to RGB-D Cameras
by Chunyang Liu, Saibao Xie, Xiqiang Ma, Yan Huang, Xin Sui, Nan Guo, Fang Yang and Xiaokang Yang
Electronics 2023, 12(10), 2316; https://doi.org/10.3390/electronics12102316 - 21 May 2023
Cited by 1 | Viewed by 1125
Abstract
Environment perception is a key part of robot self-controlled motion. When using vision to accomplish obstacle detection tasks, it is difficult for deep learning methods to detect all obstacles due to complex environment and vision limitations, and it is difficult for traditional methods [...] Read more.
Environment perception is a key part of robot self-controlled motion. When using vision to accomplish obstacle detection tasks, it is difficult for deep learning methods to detect all obstacles due to complex environment and vision limitations, and it is difficult for traditional methods to meet real-time requirements when applied to embedded platforms. In this paper, a fast obstacle-detection process applied to RGB-D cameras is proposed. The process has three main steps, feature point extraction, noise removal, and obstacle clustering. Using Canny and Shi–Tomasi algorithms to complete the pre-processing and feature point extraction, filtering noise based on geometry, grouping obstacles with different depths based on the basic principle that the feature points on the same object contour must be continuous or within the same depth in the view of RGB-D camera, and then doing further segmentation from the horizontal direction to complete the obstacle clustering work. The method omits the iterative computation process required by traditional methods and greatly reduces the memory and time overhead. After experimental verification, the proposed method has a comprehensive recognition accuracy of 82.41%, which is 4.13% and 19.34% higher than that of RSC and traditional methods, respectively, and recognition accuracy of 91.72% under normal illumination, with a recognition speed of more than 20 FPS on the embedded platform; at the same time, all detections can be achieved within 1 m under normal illumination, and the detection error is no more than 2 cm within 3 m. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

17 pages, 12715 KiB  
Article
An Infrared and Visible Image Fusion Algorithm Method Based on a Dual Bilateral Least Squares Hybrid Filter
by Quan Lu, Zhuangding Han, Likun Hu and Feiyu Tian
Electronics 2023, 12(10), 2292; https://doi.org/10.3390/electronics12102292 - 18 May 2023
Cited by 3 | Viewed by 1041
Abstract
Infrared and visible images of the same scene are fused to produce a fused image with richer information. However, most current image-fusion algorithms suffer from insufficient edge information retention, weak feature representation, and poor contrast, halos, and artifacts, and can only be applied [...] Read more.
Infrared and visible images of the same scene are fused to produce a fused image with richer information. However, most current image-fusion algorithms suffer from insufficient edge information retention, weak feature representation, and poor contrast, halos, and artifacts, and can only be applied to a single scene. To address these issues, we propose a novel infrared and visual image fusion algorithm based on a bilateral–least-squares hybrid filter (DBLSF) with the least-squares and bilateral filter hybrid model (BLF-LS). The proposed algorithm utilizes the residual network ResNet50 and the adaptive fusion strategy of the structure tensor to fuse the base and detail layers of the filter decomposition, respectively. Experiments on 32 sets of images from the TNO image-fusion dataset show that, although our fusion algorithm sacrifices overall time efficiency, the Combination 1 approach can better preserve image edge information and image integrity; reduce the loss of source image features; suppress artifacts and halos; and compare favorably with other algorithms in terms of structural similarity, feature similarity, multiscale structural similarity, root mean square error, peak signal-to-noise ratio, and correlation coefficient by at least 2.71%, 1.86%, 0.09%, 0.46%, 0.24%, and 0.07%; and the proposed Combination 2 can effectively improve the contrast and edge features of the fused image and enrich the image detail information, with an average improvement of 37.42%, 26.40%, and 26.60% in the three metrics of average gradient, edge intensity, and spatial frequency compared with other algorithms. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

21 pages, 522 KiB  
Article
An Efficient Strategy for Catastrophic Forgetting Reduction in Incremental Learning
by Huong-Giang Doan, Hong-Quan Luong, Thi-Oanh Ha and Thi Thanh Thuy Pham
Electronics 2023, 12(10), 2265; https://doi.org/10.3390/electronics12102265 - 17 May 2023
Cited by 1 | Viewed by 1738
Abstract
Deep neural networks (DNNs) have made outstanding achievements in a wide variety of domains. For deep learning tasks, large enough datasets are required for training efficient DNN models. However, big datasets are not always available, and they are costly to build. Therefore, balanced [...] Read more.
Deep neural networks (DNNs) have made outstanding achievements in a wide variety of domains. For deep learning tasks, large enough datasets are required for training efficient DNN models. However, big datasets are not always available, and they are costly to build. Therefore, balanced solutions for DNN model efficiency and training data size have caught the attention of researchers recently. Transfer learning techniques are the most common for this. In transfer learning, a DNN model is pre-trained on a large enough dataset and then applied to a new task with modest data. This fine-tuning process yields another challenge, named catastrophic forgetting. However, it can be reduced using a reasonable strategy for data argumentation in incremental learning. In this paper, we propose an efficient solution for the random selection of samples from the old task to be incrementally stored for learning a sequence of new tasks. In addition, a loss combination strategy is also proposed for optimizing incremental learning. The proposed solutions are evaluated on standard datasets with two scenarios of incremental fine-tuning: (1) New Class (NC) dataset; (2) New Class and new Instance (NCI) dataset. The experimental results show that our proposed solution achieves outstanding results compared with other SOTA rehearsal methods, as well as traditional fine-tuning solutions, ranging from 1% to 16% in recognition accuracy. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

11 pages, 12061 KiB  
Article
Big-Volume SliceGAN for Improving a Synthetic 3D Microstructure Image of Additive-Manufactured TYPE 316L Steel
by Keiya Sugiura, Toshio Ogawa, Yoshitaka Adachi, Fei Sun, Asuka Suzuki, Akinori Yamanaka, Nobuo Nakada, Takuya Ishimoto, Takayoshi Nakano and Yuichiro Koizumi
J. Imaging 2023, 9(5), 90; https://doi.org/10.3390/jimaging9050090 - 29 Apr 2023
Cited by 1 | Viewed by 1985
Abstract
A modified SliceGAN architecture was proposed to generate a high-quality synthetic three-dimensional (3D) microstructure image of TYPE 316L material manufactured through additive methods. The quality of the resulting 3D image was evaluated using an auto-correlation function, and it was discovered that maintaining a [...] Read more.
A modified SliceGAN architecture was proposed to generate a high-quality synthetic three-dimensional (3D) microstructure image of TYPE 316L material manufactured through additive methods. The quality of the resulting 3D image was evaluated using an auto-correlation function, and it was discovered that maintaining a high resolution while doubling the training image size was crucial in creating a more realistic synthetic 3D image. To meet this requirement, modified 3D image generator and critic architecture was developed within the SliceGAN framework. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

19 pages, 35019 KiB  
Article
Overcoming Adverse Conditions in Rescue Scenarios: A Deep Learning and Image Processing Approach
by Alberto Di Maro, Izar Azpiroz, Xabier Oregui Biain, Giuseppe Longo and Igor Garcia Olaizola
Appl. Sci. 2023, 13(9), 5499; https://doi.org/10.3390/app13095499 - 28 Apr 2023
Cited by 1 | Viewed by 1243
Abstract
This paper presents a Deep Learning (DL) and Image-Processing (IP) pipeline that addresses exposure recovery in challenging lighting conditions for enhancing First Responders’ (FRs) Situational Awareness (SA) during rescue operations. The method aims to improve the quality of images captured by FRs, particularly [...] Read more.
This paper presents a Deep Learning (DL) and Image-Processing (IP) pipeline that addresses exposure recovery in challenging lighting conditions for enhancing First Responders’ (FRs) Situational Awareness (SA) during rescue operations. The method aims to improve the quality of images captured by FRs, particularly in overexposed and underexposed environments while providing a response time suitable for rescue scenarios. The paper describes the technical details of the pipeline, including exposure correction, segmentation, and fusion techniques. Our results demonstrate that the pipeline effectively recovers details in challenging lighting conditions, improves object detection, and is efficient in high-stress, fast-paced rescue situations. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

26 pages, 12105 KiB  
Article
Detection of Targets in Road Scene Images Enhanced Using Conditional GAN-Based Dehazing Model
by Tsz-Yeung Chow, King-Hung Lee and Kwok-Leung Chan
Appl. Sci. 2023, 13(9), 5326; https://doi.org/10.3390/app13095326 - 24 Apr 2023
Cited by 1 | Viewed by 1438
Abstract
Object detection is a classic image processing problem. For instance, in autonomous driving applications, targets such as cars and pedestrians are detected in the road scene video. Many image-based object detection methods utilizing hand-crafted features have been proposed. Recently, more research has adopted [...] Read more.
Object detection is a classic image processing problem. For instance, in autonomous driving applications, targets such as cars and pedestrians are detected in the road scene video. Many image-based object detection methods utilizing hand-crafted features have been proposed. Recently, more research has adopted a deep learning approach. Object detectors rely on useful features, such as the object’s boundary, which are extracted via analyzing the image pixels. However, the images captured, for instance, in an outdoor environment, may be degraded due to bad weather such as haze and fog. One possible remedy is to recover the image radiance through the use of a pre-processing method such as image dehazing. We propose a dehazing model for image enhancement. The framework was based on the conditional generative adversarial network (cGAN). Our proposed model was improved with two modifications. Various image dehazing datasets were employed for comparative analysis. Our proposed model outperformed other hand-crafted and deep learning-based image dehazing methods by 2dB or more in PSNR. Moreover, we utilized the dehazed images for target detection using the object detector YOLO. In the experimentations, images were degraded by two weather conditions—rain and fog. We demonstrated that the objects detected in images enhanced by our proposed dehazing model were significantly improved over those detected in the degraded images. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

25 pages, 15325 KiB  
Article
Unsupervised Image Enhancement Method Based on Attention Map Network Guidance and Attention Mechanism
by Mengfei Wu, Taiji Lan, Xucheng Xue and Xinwei Xu
Electronics 2023, 12(8), 1887; https://doi.org/10.3390/electronics12081887 - 17 Apr 2023
Cited by 1 | Viewed by 1370
Abstract
Low-light image enhancement is a crucial preprocessing task in complex vision tasks. It directly impacts object detection, image segmentation, and image recognition outcomes. In recent years, with the continuous development of deep learning techniques, an increasing number of image enhancement methods based on [...] Read more.
Low-light image enhancement is a crucial preprocessing task in complex vision tasks. It directly impacts object detection, image segmentation, and image recognition outcomes. In recent years, with the continuous development of deep learning techniques, an increasing number of image enhancement methods based on deep learning have emerged. However, due to the high cost of data collection and the limited content of supervised learning datasets, more and more scholars have shifted their focus to the field of unsupervised image enhancement. Unsupervised image enhancement methods do not require paired images of the same scene during the training process, which greatly reduces the threshold for network training. Nevertheless, current unsupervised methods still suffer from issues such as unstable enhancement effects and limited generalization ability. To address these problems, we propose an improved low-light image enhancement method. The proposed method employs the LSGAN as the training architecture and utilizes an attention map network to dynamically generate attention maps that best fit the network enhancement task, which can effectively improve the generalization ability and enhancement performance of the network. Additionally, we adopt an attention mechanism to enhance the subtle details of the image features. Regarding the network training, considering that the traditional convolutional neural network discriminator may not provide effective guidance to the generator in the early stages of training, we propose an improved discriminator structure. The experimental results demonstrate that our method can achieve good enhancement performance on different datasets and has practical value. Although our method has advantages in enhancing low-light images, it also has certain limitations, such as the network size not meeting the requirements for lightweight models and the potential for further improvement under extremely low-light conditions. We will strive to address these issues as comprehensively as possible in our future research. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

20 pages, 7701 KiB  
Article
Research on Identification and Location of Charging Ports of Multiple Electric Vehicles Based on SFLDLC-CBAM-YOLOV7-Tinp-CTMA
by Pengkun Quan, Ya’nan Lou, Haoyu Lin, Zhuo Liang, Dongbo Wei and Shichun Di
Electronics 2023, 12(8), 1855; https://doi.org/10.3390/electronics12081855 - 14 Apr 2023
Cited by 2 | Viewed by 1373
Abstract
With the gradual maturity of autonomous driving and automatic parking technology, electric vehicle charging is moving towards automation. The charging port (CP) location is an important basis for realizing automatic charging. Existing CP identification algorithms are only suitable for a single vehicle model [...] Read more.
With the gradual maturity of autonomous driving and automatic parking technology, electric vehicle charging is moving towards automation. The charging port (CP) location is an important basis for realizing automatic charging. Existing CP identification algorithms are only suitable for a single vehicle model with poor universality. Therefore, this paper proposes a set of methods that can identify the CPs of various vehicle types. The recognition process is divided into a rough positioning stage (RPS) and a precise positioning stage (PPS). In this study, the data sets corresponding to four types of vehicle CPs under different environments are established. In the RPS, the characteristic information of the CP is obtained based on the combination of convolutional block attention module (CBAM) and YOLOV7-tinp, and its position information is calculated using the similar projection relationship. For the PPS, this paper proposes a data enhancement method based on similar feature location to determine the label category (SFLDLC). The CBAM-YOLOV7-tinp is used to identify the feature location information, and the cluster template matching algorithm (CTMA) is used to obtain the accurate feature location and tag type, and the EPnP algorithm is used to calculate the location and posture (LP) information. The results of the LP solution are used to provide the position coordinates of the CP relative to the robot base. Finally, the AUBO-i10 robot is used to complete the experimental test. The corresponding results show that the average positioning errors (x, y, z, rx, ry, and rz) of the CP are 0.64 mm, 0.88 mm, 1.24 mm, 1.19 degrees, 1.00 degrees, and 0.57 degrees, respectively, and the integrated insertion success rate is 94.25%. Therefore, the algorithm proposed in this paper can efficiently and accurately identify and locate various types of CP and meet the actual plugging requirements. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

13 pages, 4183 KiB  
Article
Thangka Sketch Colorization Based on Multi-Level Adaptive-Instance-Normalized Color Fusion and Skip Connection Attention
by Hang Li, Jie Fang, Ying Jia, Liqi Ji, Xin Chen and Nianyi Wang
Electronics 2023, 12(7), 1745; https://doi.org/10.3390/electronics12071745 - 06 Apr 2023
Cited by 1 | Viewed by 1386
Abstract
Thangka is an important intangible cultural heritage of Tibet. Due to the complexity, and time-consuming nature of the Thangka painting technique, this technique is currently facing the risk of being lost. It is important to preserve the art of Thangka through digital painting [...] Read more.
Thangka is an important intangible cultural heritage of Tibet. Due to the complexity, and time-consuming nature of the Thangka painting technique, this technique is currently facing the risk of being lost. It is important to preserve the art of Thangka through digital painting methods. Machine learning-based auto-sketch colorization is one of the vital steps for digital Thangka painting. However, existing learning-based sketch colorization methods face two challenges in solving the problem of colorizing Thangka: (1) the extremely rich colors of the Thangka make it difficult to color accurately with existing algorithms, and (2) the line density of the Thangka brings extreme challenges for algorithms to define what semantic information the lines imply. To resolve these problems, we propose a Thangka sketch colorization method based on multi-level adaptive-instance-normalized color fusion (MACF) and skip connection attention (SCA). The proposed method consists of two parts: (1) a multi-level adaptive-instance-normalized color fusion (MACF) to fuse sketch feature and color feature; and (2) a skip connection attention (SCA) mechanism to distinguish the semantic information implied by the sketch lines. Experiments on colorizing Thangka sketches show that our method works well on two small datasets—the Danbooru 2019 dataset and the Thangka dataset. Our approach can generate exquisite Thangka. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

19 pages, 6335 KiB  
Article
Research on Improved Multi-Channel Image Stitching Technology Based on Fast Algorithms
by Han Gao, Zhangqin Huang, Huapeng Yang, Xiaobo Zhang and Chen Cen
Electronics 2023, 12(7), 1700; https://doi.org/10.3390/electronics12071700 - 03 Apr 2023
Cited by 3 | Viewed by 2400
Abstract
The image registration and fusion process of image stitching algorithms entails significant computational costs, and the use of robust stitching algorithms with good performance is limited in real-time applications on PCs (personal computers) and embedded systems. Fast image registration and fusion algorithms suffer [...] Read more.
The image registration and fusion process of image stitching algorithms entails significant computational costs, and the use of robust stitching algorithms with good performance is limited in real-time applications on PCs (personal computers) and embedded systems. Fast image registration and fusion algorithms suffer from problems such as ghosting and dashed lines, resulting in suboptimal display effects on the stitching. Consequently, this study proposes a multi-channel image stitching approach based on fast image registration and fusion algorithms, which enhances the stitching effect on the basis of fast algorithms, thereby augmenting its potential for deployment in real-time applications. First, in the image registration stage, the gridded Binary Robust Invariant Scalable Keypoints (BRISK) method was used to improve the matching efficiency of feature points, and the Grid-based Motion Statistics (GMS) algorithm with a bidirectional rough matching method was used to improve the matching accuracy of feature points. Then, the optimal seam algorithm was used in the image fusion stage to obtain the seam line and construct the fusion area. The seam and transition areas were fused using the fade-in and fade-out weighting algorithm to obtain smooth and high-quality stitched images. The experimental results demonstrate the performance of our proposed method through an improvement in image registration and fusion metrics. We compared our approach with both the original algorithm and other existing methods and achieved significant improvements in eliminating stitching artifacts such as ghosting and discontinuities while maintaining the efficiency of fast algorithms. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

19 pages, 2382 KiB  
Article
A Self-Supervised Tree-Structured Framework for Fine-Grained Classification
by Qihang Cai, Lei Niu, Xibin Shang and Heng Ding
Appl. Sci. 2023, 13(7), 4453; https://doi.org/10.3390/app13074453 - 31 Mar 2023
Viewed by 1174
Abstract
In computer vision, fine-grained classification has become an important issue in recognizing objects with slight visual differences. Usually, it is challenging to generate good performance when solving fine-grained classification problems using traditional convolutional neural networks. To improve the accuracy and training time of [...] Read more.
In computer vision, fine-grained classification has become an important issue in recognizing objects with slight visual differences. Usually, it is challenging to generate good performance when solving fine-grained classification problems using traditional convolutional neural networks. To improve the accuracy and training time of convolutional neural networks in solving fine-grained classification problems, this paper proposes a tree-structured framework by eliminating the effect of differences between clusters. The contributions of the proposed method include the following three aspects: (1) a self-supervised method that automatically creates a classification tree, eliminating the need for manual labeling; (2) a machine-learning matcher which determines the cluster to which an item belongs, minimizing the impact of inter-cluster variations on classification; and (3) a pruning criterion which filters the tree-structured classifier, retaining only the models with superior classification performance. The experimental evaluation of the proposed tree-structured framework demonstrates its effectiveness in reducing training time and improving the accuracy of fine-grained classification across various datasets in comparison with conventional convolutional neural network models. Specifically, for the CUB 200 2011, FGVC aircraft, and Stanford car datasets, the proposed method achieves a reduction in training time of 32.91%, 35.87%, and 14.48%, and improves the accuracy of fine-grained classification by 1.17%, 2.01%, and 0.59%, respectively. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

21 pages, 7780 KiB  
Article
Pixel-Coordinate-Induced Human Pose High-Precision Estimation Method
by Xuefei Sun, Mohammed Jajere Adamu, Ruifeng Zhang, Xin Guan and Qiang Li
Electronics 2023, 12(7), 1648; https://doi.org/10.3390/electronics12071648 - 31 Mar 2023
Viewed by 1425
Abstract
Accurately estimating human pose is crucial for providing feedback during exercises or musical performances, but the complex and flexible nature of human joints makes it challenging. Additionally, traditional methods often neglect pixel coordinates, which are naturally present in high-resolution images of the human [...] Read more.
Accurately estimating human pose is crucial for providing feedback during exercises or musical performances, but the complex and flexible nature of human joints makes it challenging. Additionally, traditional methods often neglect pixel coordinates, which are naturally present in high-resolution images of the human body. To address this issue, we propose a novel human pose estimation method that directly incorporates pixel coordinates. Our method adds a coordinate channel to the convolution process and embeds pixel coordinates into the feature map, while also using coordinate attention to capture position- and structure-sensitive features. We further reduce the network parameters and computational cost by using small-scale convolution kernels and a smooth activation function in residual blocks. We evaluate our model on the MPII Human Pose and COCO Keypoint Detection datasets and demonstrate improved accuracy, highlighting the importance of directly incorporating coordinate location information in position-sensitive tasks. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

26 pages, 4206 KiB  
Article
A Genetic Algorithm Based One Class Support Vector Machine Model for Arabic Skilled Forgery Signature Verification
by Ansam A. Abdulhussien, Mohammad F. Nasrudin, Saad M. Darwish and Zaid Abdi Alkareem Alyasseri
J. Imaging 2023, 9(4), 79; https://doi.org/10.3390/jimaging9040079 - 29 Mar 2023
Cited by 2 | Viewed by 2731
Abstract
Recently, signature verification systems have been widely adopted for verifying individuals based on their handwritten signatures, especially in forensic and commercial transactions. Generally, feature extraction and classification tremendously impact the accuracy of system authentication. Feature extraction is challenging for signature verification systems due [...] Read more.
Recently, signature verification systems have been widely adopted for verifying individuals based on their handwritten signatures, especially in forensic and commercial transactions. Generally, feature extraction and classification tremendously impact the accuracy of system authentication. Feature extraction is challenging for signature verification systems due to the diverse forms of signatures and sample circumstances. Current signature verification techniques demonstrate promising results in identifying genuine and forged signatures. However, the overall performance of skilled forgery detection remains rigid to deliver high contentment. Furthermore, most of the current signature verification techniques demand a large number of learning samples to increase verification accuracy. This is the primary disadvantage of using deep learning, as the figure of signature samples is mainly restricted to the functional application of the signature verification system. In addition, the system inputs are scanned signatures that comprise noisy pixels, a complicated background, blurriness, and contrast decay. The main challenge has been attaining a balance between noise and data loss, since some essential information is lost during preprocessing, probably influencing the subsequent stages of the system. This paper tackles the aforementioned issues by presenting four main steps: preprocessing, multifeature fusion, discriminant feature selection using a genetic algorithm based on one class support vector machine (OCSVM-GA), and a one-class learning strategy to address imbalanced signature data in the practical application of a signature verification system. The suggested method employs three databases of signatures: SID-Arabic handwritten signatures, CEDAR, and UTSIG. Experimental results depict that the proposed approach outperforms current systems in terms of false acceptance rate (FAR), false rejection rate (FRR), and equal error rate (EER). Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

15 pages, 5975 KiB  
Article
DFEN: Dual Feature Enhancement Network for Remote Sensing Image Caption
by Weihua Zhao, Wenzhong Yang, Danny Chen and Fuyuan Wei
Electronics 2023, 12(7), 1547; https://doi.org/10.3390/electronics12071547 - 25 Mar 2023
Cited by 4 | Viewed by 1073
Abstract
The remote sensing image caption can acquire ground objects and the semantic relationships between different ground objects. Existing remote sensing image caption algorithms do not acquire enough ground object information from remote-sensing images, resulting in inaccurate captions. As a result, this paper proposes [...] Read more.
The remote sensing image caption can acquire ground objects and the semantic relationships between different ground objects. Existing remote sensing image caption algorithms do not acquire enough ground object information from remote-sensing images, resulting in inaccurate captions. As a result, this paper proposes a codec-based Dual Feature Enhancement Network (“DFEN”) to enhance ground object information from both image and text levels. We build the Image-Enhancement module at the image level using the multiscale characteristics of remote sensing images. Furthermore, more discriminative image context features are obtained through the Image-Enhancement module. The hierarchical attention mechanism aggregates multi-level features and supplements the ground object information ignored due to large-scale differences. At the text level, we use the image’s potential visual features to guide the Text-Enhance module, resulting in text guidance features that correctly focus on the information of the ground objects. Experiment results show that the DFEN model can enhance ground object information from images and text. Specifically, the BLEU-1 index increased by 8.6% in UCM-caption, 2.3% in Sydney-caption, and 5.1% in RSICD. The DFEN model has promoted the exploration of advanced semantics of remote sensing images and facilitated the development of remote sensing image caption. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

23 pages, 8705 KiB  
Article
Enhancing Image Encryption with the Kronecker xor Product, the Hill Cipher, and the Sigmoid Logistic Map
by Dani Elias Mfungo, Xianping Fu, Xingyuan Wang and Yongjin Xian
Appl. Sci. 2023, 13(6), 4034; https://doi.org/10.3390/app13064034 - 22 Mar 2023
Cited by 10 | Viewed by 2213
Abstract
In today’s digital age, it is crucial to secure the flow of information to protect data and information from being hacked during transmission or storage. To address this need, we present a new image encryption technique that combines the Kronecker xor product, Hill [...] Read more.
In today’s digital age, it is crucial to secure the flow of information to protect data and information from being hacked during transmission or storage. To address this need, we present a new image encryption technique that combines the Kronecker xor product, Hill cipher, and sigmoid logistic Map. Our proposed algorithm begins by shifting the values in each row of the state matrix to the left by a predetermined number of positions, then encrypting the resulting image using the Hill Cipher. The top value of each odd or even column is used to perform an xor operation with all values in the corresponding even or odd column, excluding the top value. The resulting image is then diffused using a sigmoid logistic map and subjected to the Kronecker xor product operation among the pixels to create a secure image. The image is then diffused again with other keys from the sigmoid logistic map for the final product. We compared our proposed method to recent work and found it to be safe and efficient in terms of performance after conducting statistical analysis, differential attack analysis, brute force attack analysis, and information entropy analysis. The results demonstrate that our proposed method is robust, lightweight, and fast in performance, meets the requirements for encryption and decryption, and is resistant to various attacks. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

19 pages, 4667 KiB  
Article
Fish Detection and Classification for Automatic Sorting System with an Optimized YOLO Algorithm
by Ari Kuswantori, Taweepol Suesut, Worapong Tangsrirat, Gerhard Schleining and Navaphattra Nunak
Appl. Sci. 2023, 13(6), 3812; https://doi.org/10.3390/app13063812 - 16 Mar 2023
Cited by 3 | Viewed by 4248
Abstract
Automatic fish recognition using deep learning and computer or machine vision is a key part of making the fish industry more productive through automation. An automatic sorting system will help to tackle the challenges of increasing food demand and the threat of food [...] Read more.
Automatic fish recognition using deep learning and computer or machine vision is a key part of making the fish industry more productive through automation. An automatic sorting system will help to tackle the challenges of increasing food demand and the threat of food scarcity in the future due to the continuing growth of the world population and the impact of global warming and climate change. As far as the authors know, there has been no published work so far to detect and classify moving fish for the fish culture industry, especially for automatic sorting purposes based on the fish species using deep learning and machine vision. This paper proposes an approach based on the recognition algorithm YOLOv4, optimized with a unique labeling technique. The proposed method was tested with videos of real fish running on a conveyor, which were put randomly in position and order at a speed of 505.08 m/h and could obtain an accuracy of 98.15%. This study with a simple but effective method is expected to be a guide for automatically detecting, classifying, and sorting fish. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

12 pages, 1852 KiB  
Article
Color Constancy Based on Local Reflectance Differences
by Ming Yan, Yueli Hu and Haikun Zhang
Electronics 2023, 12(6), 1396; https://doi.org/10.3390/electronics12061396 - 15 Mar 2023
Cited by 1 | Viewed by 1496
Abstract
Color constancy is used to determine the actual surface color of the scene affected by illumination so that the captured image is more in line with the characteristics of human perception. The well-known Gray-Edge hypothesis states that the average edge difference in a [...] Read more.
Color constancy is used to determine the actual surface color of the scene affected by illumination so that the captured image is more in line with the characteristics of human perception. The well-known Gray-Edge hypothesis states that the average edge difference in a scene is achromatic. Inspired by the Gray-Edge hypothesis, we propose a new illumination estimation method. Specifically, after analyzing three public datasets containing rich illumination conditions and scenes, we found that the ratio of the global sum of reflectance differences to the global sum of locally normalized reflectance differences is achromatic. Based on this hypothesis, we also propose an accurate color constancy method. The method was tested on four test datasets containing various illumination conditions (three datasets in a single-light environment and one dataset in a multi-light environment). The results show that the proposed method outperforms the state-of-the-art color constancy methods. Furthermore, we propose a new framework that can incorporate current mainstream statistics-based color constancy methods (Gray-World, Max-RGB, Gray-Edge, etc.) into the proposed framework. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

13 pages, 4498 KiB  
Article
Visual Attention Adversarial Networks for Chinese Font Translation
by Te Li, Fang Yang and Yao Song
Electronics 2023, 12(6), 1388; https://doi.org/10.3390/electronics12061388 - 14 Mar 2023
Cited by 1 | Viewed by 1635
Abstract
Currently, many Chinese font translation models adopt the method of dividing character components to improve the quality of generated font images. However, character components require a large amount of manual annotation to decompose characters and determine the composition of each character as input [...] Read more.
Currently, many Chinese font translation models adopt the method of dividing character components to improve the quality of generated font images. However, character components require a large amount of manual annotation to decompose characters and determine the composition of each character as input for training. In this paper, we establish a Chinese font translation model based on generative adversarial network without decomposition. First, we improve the method of image enhancement for Chinese character images. It helps the model learning structure information of Chinese character strokes to generate font images with complete and accurate strokes. Second, we propose a visual attention adversarial network. By using visual attention block, the network catches global and local features for constructing details of characters. Experiments demonstrate our method generates high-quality Chinese character images with great style diversity including calligraphy characters. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

30 pages, 14229 KiB  
Article
A Real-Time Registration Algorithm of UAV Aerial Images Based on Feature Matching
by Zhiwen Liu, Gen Xu, Jiangjian Xiao, Jingxiang Yang, Ziyang Wang and Siyuan Cheng
J. Imaging 2023, 9(3), 67; https://doi.org/10.3390/jimaging9030067 - 11 Mar 2023
Cited by 1 | Viewed by 2606
Abstract
This study aimed to achieve the accurate and real-time geographic positioning of UAV aerial image targets. We verified a method of registering UAV camera images on a map (with the geographic location) through feature matching. The UAV is usually in rapid motion and [...] Read more.
This study aimed to achieve the accurate and real-time geographic positioning of UAV aerial image targets. We verified a method of registering UAV camera images on a map (with the geographic location) through feature matching. The UAV is usually in rapid motion and involves changes in the camera head, and the map is high-resolution and has sparse features. These reasons make it difficult for the current feature-matching algorithm to accurately register the two (camera image and map) in real time, meaning that there will be a large number of mismatches. To solve this problem, we used the SuperGlue algorithm, which has a better performance, to match the features. The layer and block strategy, combined with the prior data of the UAV, was introduced to improve the accuracy and speed of feature matching, and the matching information obtained between frames was introduced to solve the problem of uneven registration. Here, we propose the concept of updating map features with UAV image features to enhance the robustness and applicability of UAV aerial image and map registration. After numerous experiments, it was proved that the proposed method is feasible and can adapt to the changes in the camera head, environment, etc. The UAV aerial image is stably and accurately registered on the map, and the frame rate reaches 12 frames per second, which provides a basis for the geo-positioning of UAV aerial image targets. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

18 pages, 11024 KiB  
Article
Night Vision Anti-Halation Algorithm Based on Different-Source Image Fusion Combining Visual Saliency with YUV-FNSCT
by Quanmin Guo, Fan Yang and Hanlei Wang
Electronics 2023, 12(6), 1303; https://doi.org/10.3390/electronics12061303 - 09 Mar 2023
Viewed by 1608
Abstract
In order to address driver’s dazzle caused by the abuse of high beams when vehicles meet at night, a night vision anti-halation algorithm based on image fusion combining visual saliency with YUV-FNSCT is proposed. Improved Frequency-turned (FT) visual saliency detection is proposed to [...] Read more.
In order to address driver’s dazzle caused by the abuse of high beams when vehicles meet at night, a night vision anti-halation algorithm based on image fusion combining visual saliency with YUV-FNSCT is proposed. Improved Frequency-turned (FT) visual saliency detection is proposed to quickly lock on the objects of interest, such as vehicles and pedestrians, so as to improve the salient features of fusion images. The high- and low-frequency sub-bands of infrared saliency images and visible luminance components can quickly be obtained using fast non-subsampled contourlet transform (FNSCT), which has the characteristics of multi-direction, multi-scale, and shift-invariance. According to the halation degree in the visible image, the nonlinear adaptive fusion strategy of low-frequency weight reasonably eliminates halation while retaining useful information from the original image to the maximum extent. The statistical matching feature fusion strategy distinguishes the common and unique edge information from the high-frequency sub-bands by mutual matching so as to obtain more effective details of the original images such as the edges and contours. Only the luminance Y decomposed by YUV transform is involved in image fusion, which not only avoids color shift of the fusion image but also reduces the amount of computation. Considering the night driving environment and the degree of halation, the visible images and infrared images were collected for anti-halation fusion in six typical halation scenes on three types of roads covering most night driving conditions. The fused images obtained by the proposed algorithm demonstrate complete halation elimination, rich color details, and obvious salient features and have the best comprehensive index in each halation scene. The experimental results and analysis show that the proposed algorithm has advantages in halation elimination and visual saliency and has good universality for different night vision halation scenes, which help drivers to observe the road ahead and improve the safety of night driving. It also has certain applicability to rainy, foggy, smoggy, and other complex weather. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

22 pages, 4672 KiB  
Article
DCTable: A Dilated CNN with Optimizing Anchors for Accurate Table Detection
by Takwa Kazdar, Wided Souidene Mseddi, Moulay A. Akhloufi, Ala Agrebi, Marwa Jmal and Rabah Attia
J. Imaging 2023, 9(3), 62; https://doi.org/10.3390/jimaging9030062 - 07 Mar 2023
Cited by 1 | Viewed by 1446
Abstract
With the widespread use of deep learning in leading systems, it has become the mainstream in the table detection field. Some tables are difficult to detect because of the likely figure layout or the small size. As a solution to the underlined problem, [...] Read more.
With the widespread use of deep learning in leading systems, it has become the mainstream in the table detection field. Some tables are difficult to detect because of the likely figure layout or the small size. As a solution to the underlined problem, we propose a novel method, called DCTable, to improve Faster R-CNN for table detection. DCTable came up to extract more discriminative features using a backbone with dilated convolutions in order to improve the quality of region proposals. Another main contribution of this paper is the anchors optimization using the Intersection over Union (IoU)-balanced loss to train the RPN and reduce the false positive rate. This is followed by a RoI Align layer, instead of the ROI pooling, to improve the accuracy during mapping table proposal candidates by eliminating the coarse misalignment and introducing the bilinear interpolation in mapping region proposal candidates. Training and testing on a public dataset showed the effectiveness of the algorithm and a considerable improvement of the F1-score on ICDAR 2017-Pod, ICDAR-2019, Marmot and RVL CDIP datasets. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

19 pages, 27878 KiB  
Article
Hybrid Classifiers for Spatio-Temporal Abnormal Behavior Detection, Tracking, and Recognition in Massive Hajj Crowds
by Tarik Alafif, Anas Hadi, Manal Allahyani, Bander Alzahrani, Areej Alhothali, Reem Alotaibi and Ahmed Barnawi
Electronics 2023, 12(5), 1165; https://doi.org/10.3390/electronics12051165 - 28 Feb 2023
Cited by 9 | Viewed by 2929
Abstract
Individual abnormal behaviors vary depending on crowd sizes, contexts, and scenes. Challenges such as partial occlusions, blurring, a large number of abnormal behaviors, and camera viewing occur in large-scale crowds when detecting, tracking, and recognizing individuals with abnormalities. In this paper, our contribution [...] Read more.
Individual abnormal behaviors vary depending on crowd sizes, contexts, and scenes. Challenges such as partial occlusions, blurring, a large number of abnormal behaviors, and camera viewing occur in large-scale crowds when detecting, tracking, and recognizing individuals with abnormalities. In this paper, our contribution is two-fold. First, we introduce an annotated and labeled large-scale crowd abnormal behavior Hajj dataset, HAJJv2. Second, we propose two methods of hybrid convolutional neural networks (CNNs) and random forests (RFs) to detect and recognize spatio-temporal abnormal behaviors in small and large-scale crowd videos. In small-scale crowd videos, a ResNet-50 pre-trained CNN model is fine-tuned to verify whether every frame is normal or abnormal in the spatial domain. If anomalous behaviors are observed, a motion-based individual detection method based on the magnitudes and orientations of Horn–Schunck optical flow is proposed to locate and track individuals with abnormal behaviors. A Kalman filter is employed in large-scale crowd videos to predict and track the detected individuals in the subsequent frames. Then, means and variances as statistical features are computed and fed to the RF classifier to classify individuals with abnormal behaviors in the temporal domain. In large-scale crowds, we fine-tune the ResNet-50 model using a YOLOv2 object detection technique to detect individuals with abnormal behaviors in the spatial domain. The proposed method achieves 99.76% and 93.71% of average area under the curves (AUCs) on two public benchmark small-scale crowd datasets, UMN and UCSD, respectively, while the large-scale crowd method achieves 76.08% average AUC using the HAJJv2 dataset. Our method outperforms state-of-the-art methods using the small-scale crowd datasets with a margin of 1.66%, 6.06%, and 2.85% on UMN, UCSD Ped1, and UCSD Ped2, respectively. It also produces an acceptable result in large-scale crowds. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

15 pages, 93947 KiB  
Article
KD-PatchMatch: A Self-Supervised Training Learning-Based PatchMatch
by Qingyu Tan, Zhijun Fang and Xiaoyan Jiang
Appl. Sci. 2023, 13(4), 2224; https://doi.org/10.3390/app13042224 - 09 Feb 2023
Viewed by 1137
Abstract
Traditional learning-based multi-view stereo (MVS) methods usually need to find the correct depth value from a large number of depth candidates, which leads to huge memory consumption and slow inference. To address these problems, we propose a probabilistic depth sampling in the learning-based [...] Read more.
Traditional learning-based multi-view stereo (MVS) methods usually need to find the correct depth value from a large number of depth candidates, which leads to huge memory consumption and slow inference. To address these problems, we propose a probabilistic depth sampling in the learning-based PatchMatch framework, i.e., sampling a small number of depth candidates from a single-view probability distribution, which achieves the purpose of saving computational resources. Furthermore, to overcome the difficulty of obtaining ground-truth depth for outdoor large-scale scenes, we also propose a self-supervised training pipeline based on knowledge distillation, which involves self-supervised teacher training and student training based on knowledge distillation. Extensive experiments show that our approach outperforms other recent learning-based MVS methods on DTU, Tanks and Temples, and ETH3D datasets. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

19 pages, 4687 KiB  
Article
Human Pose Estimation via Dynamic Information Transfer
by Yihang Li, Qingxuan Shi, Jingya Song and Fang Yang
Electronics 2023, 12(3), 695; https://doi.org/10.3390/electronics12030695 - 30 Jan 2023
Cited by 1 | Viewed by 2241
Abstract
This paper presents a multi-task learning framework, called the dynamic information transfer network (DITN). We mainly focused on improving the pose estimation with the spatial relationship of the adjacent joints. To benefit from the explicit structural knowledge, we constructed two branches with a [...] Read more.
This paper presents a multi-task learning framework, called the dynamic information transfer network (DITN). We mainly focused on improving the pose estimation with the spatial relationship of the adjacent joints. To benefit from the explicit structural knowledge, we constructed two branches with a shared backbone to localize the human joints and bones, respectively. Since related tasks share a high-level representation, we leveraged the bone information to refine the joint localization via dynamic information transfer. In detail, we extracted the dynamic parameters from the bone branch and used them to make the network learn constraint relationships via dynamic convolution. Moreover, attention blocks were added after the information transfer to balance the information across different granularity levels and induce the network to focus on the informative regions. The experimental results demonstrated the effectiveness of the DITN, which achieved 90.8% PCKh@0.5 on MPII and 75.0% AP on COCO. The qualitative results on the MPII and COCO datasets showed that the DITN achieved better performance, especially on heavily occluded or easily confusable joint localization. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

19 pages, 9271 KiB  
Article
Dynamic Multi-Attention Dehazing Network with Adaptive Feature Fusion
by Donghui Zhao, Bo Mo, Xiang Zhu, Jie Zhao, Heng Zhang, Yimeng Tao and Chunbo Zhao
Electronics 2023, 12(3), 529; https://doi.org/10.3390/electronics12030529 - 19 Jan 2023
Cited by 4 | Viewed by 1414
Abstract
This paper proposes a Dynamic Multi-Attention Dehazing Network (DMADN) for single image dehazing. The proposed network consists of two key components, the Dynamic Feature Attention (DFA) module, and the Adaptive Feature Fusion (AFF) module. The DFA module provides pixel-wise weights and channel-wise weights [...] Read more.
This paper proposes a Dynamic Multi-Attention Dehazing Network (DMADN) for single image dehazing. The proposed network consists of two key components, the Dynamic Feature Attention (DFA) module, and the Adaptive Feature Fusion (AFF) module. The DFA module provides pixel-wise weights and channel-wise weights for input features, considering that the haze distribution is always uneven in a degenerated image and the value in each channel is different. We propose an AFF module based on the adaptive mixup operation to restore the missing spatial information from high-resolution layers. Most previous works have concentrated on increasing the scale of the model to improve dehazing performance, which makes it difficult to apply in edge devices. We introduce contrastive learning in our training processing, which leverages both positive and negative samples to optimize our network. The contrastive learning strategy could effectively improve the quality of output while not increasing the model’s complexity and inference time in the testing phase. Extensive experimental results on the synthetic and real-world hazy images demonstrate that DMADN achieves state-of-the-art dehazing performance with a competitive number of parameters. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

21 pages, 9966 KiB  
Article
Wildlife Object Detection Method Applying Segmentation Gradient Flow and Feature Dimensionality Reduction
by Mingyu Zhang, Fei Gao, Wuping Yang and Haoran Zhang
Electronics 2023, 12(2), 377; https://doi.org/10.3390/electronics12020377 - 11 Jan 2023
Cited by 8 | Viewed by 2342 | Correction
Abstract
This work suggests an enhanced natural environment animal detection algorithm based on YOLOv5s to address the issues of low detection accuracy and sluggish detection speed when automatically detecting and classifying large animals in natural environments. To increase the detection speed of the model, [...] Read more.
This work suggests an enhanced natural environment animal detection algorithm based on YOLOv5s to address the issues of low detection accuracy and sluggish detection speed when automatically detecting and classifying large animals in natural environments. To increase the detection speed of the model, the algorithm first enhances the SPP by switching the parallel connection of the original maximum pooling layer for a series connection. It then expands the model’s receptive field using the dataset from this paper to enhance the feature fusion network by stacking the feature pyramid network structure as a whole; secondly, it introduces the GSConv module, which combines standard convolution, depth-separable convolution, and hybrid channels to reduce network parameters and computation, making the model lightweight and easier to deploy to endpoints. At the same time, GS bottleneck is used to replace the Bottleneck module in C3, which divides the input feature map into two channels and assigns different weights to them. The two channels are combined and connected in accordance with the number of channels, which enhances the model’s ability to express non-linear functions and resolves the gradient disappearance issue. Wildlife images are obtained from the OpenImages public dataset and real-life shots. The experimental results show that the improved YOLOv5s algorithm proposed in this paper reduces the computational effort of the model compared to the original algorithm, while also providing an improvement in both detection accuracy and speed, and it can be well applied to the real-time detection of animals in natural environments. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

15 pages, 30832 KiB  
Article
Automatic Method for Vickers Hardness Estimation by Image Processing
by Jonatan D. Polanco, Carlos Jacanamejoy-Jamioy, Claudia L. Mambuscay, Jeferson F. Piamba and Manuel G. Forero
J. Imaging 2023, 9(1), 8; https://doi.org/10.3390/jimaging9010008 - 30 Dec 2022
Cited by 4 | Viewed by 2129
Abstract
Hardness is one of the most important mechanical properties of materials, since it is used to estimate their quality and to determine their suitability for a particular application. One method of determining quality is the Vickers hardness test, in which the resistance to [...] Read more.
Hardness is one of the most important mechanical properties of materials, since it is used to estimate their quality and to determine their suitability for a particular application. One method of determining quality is the Vickers hardness test, in which the resistance to plastic deformation at the surface of the material is measured after applying force with an indenter. The hardness is measured from the sample image, which is a tedious, time-consuming, and prone to human error procedure. Therefore, in this work, a new automatic method based on image processing techniques is proposed, allowing for obtaining results quickly and more accurately even with high irregularities in the indentation mark. For the development and validation of the method, a set of microscopy images of samples indented with applied forces of 5N and 10N on AISI D2 steel with and without quenching, tempering heat treatment and samples coated with titanium niobium nitride (TiNbN) was used. The proposed method was implemented as a plugin of the ImageJ program, allowing for obtaining reproducible Vickers hardness results in an average time of 2.05 seconds with an accuracy of 98.3% and a maximum error of 4.5% with respect to the values obtained manually, used as a golden standard. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

15 pages, 983 KiB  
Article
Prototype-Based Self-Adaptive Distribution Calibration for Few-Shot Image Classification
by Wei Du, Xiaoping Hu, Xin Wei and Ke Zuo
Electronics 2023, 12(1), 134; https://doi.org/10.3390/electronics12010134 - 28 Dec 2022
Viewed by 1505
Abstract
Deep learning has flourished in large-scale supervised tasks. However, in many practical conditions, rich and available labeled data are a luxury. Thus, few-shot learning (FSL) has recently received boosting interest and achieved significant progress, which can learn new classes from several labeled samples. [...] Read more.
Deep learning has flourished in large-scale supervised tasks. However, in many practical conditions, rich and available labeled data are a luxury. Thus, few-shot learning (FSL) has recently received boosting interest and achieved significant progress, which can learn new classes from several labeled samples. The advanced distribution calibration approach estimates the ground-truth distribution of few-shot classes by reusing the statistics of auxiliary data. However, there is still a significant discrepancy between the estimated distributions and ground-truth distributions, and artificially set hyperparameters cannot be adapted to different application scenarios (i.e., datasets). This paper proposes a prototype-based self-adaptive distribution calibration framework for estimating ground-truth distribution accurately and self-adaptive hyperparameter optimization for different application scenarios. Specifically, the proposed method is divided into two components. The prototype-based representative mechanism is for obtaining and utilizing more global information about few-shot classes and improving classification performance. The self-adaptive hyperparameter optimization algorithm searches robust hyperparameters for the distribution calibration of different application scenarios. The ablation studies verify the effectiveness of the various components of the proposed framework. Enormous experiments are conducted on three standard benchmarks such as miniImageNet, CUB-200-2011, and CIFAR-FS. The competitive results and compelling visualizations indicate that the proposed framework achieves state-of-the-art performance. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

21 pages, 12240 KiB  
Article
A Nonlinear Diffusion Model with Smoothed Background Estimation to Enhance Degraded Images for Defect Detection
by Tao Yang, Bingchao Xu, Bin Zhou and Wei Wei
Appl. Sci. 2023, 13(1), 211; https://doi.org/10.3390/app13010211 - 24 Dec 2022
Cited by 1 | Viewed by 1263
Abstract
It is important to detect the defect of products efficiently in modern industrial manufacturing. Image processing is one of common techniques to achieve defect detection successfully. To process images degraded by noise and lower contrast effects in some scenes, this paper presents a [...] Read more.
It is important to detect the defect of products efficiently in modern industrial manufacturing. Image processing is one of common techniques to achieve defect detection successfully. To process images degraded by noise and lower contrast effects in some scenes, this paper presents a new energy functional with background fitting, then deduces a novel model which approximates to estimate the smoothed background and performs the nonlinear diffusion on the residual image. Noise removal and background correction can be both successfully achieved while the defect feature is preserved. Finally, the proposed method and some other comparative methods are performed on several experiments with some classical degraded images. The numerical results and quantitative evaluation show the efficiency and advantages of the proposed method. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

20 pages, 5767 KiB  
Article
Driver Emotion and Fatigue State Detection Based on Time Series Fusion
by Yucheng Shang, Mutian Yang, Jianwei Cui, Linwei Cui, Zizheng Huang and Xiang Li
Electronics 2023, 12(1), 26; https://doi.org/10.3390/electronics12010026 - 21 Dec 2022
Cited by 7 | Viewed by 2923
Abstract
Studies have shown that driver fatigue or unpleasant emotions significantly increase driving risks. Detecting driver emotions and fatigue states and providing timely warnings can effectively minimize the incidence of traffic accidents. However, existing models rarely combine driver emotion and fatigue detection, and there [...] Read more.
Studies have shown that driver fatigue or unpleasant emotions significantly increase driving risks. Detecting driver emotions and fatigue states and providing timely warnings can effectively minimize the incidence of traffic accidents. However, existing models rarely combine driver emotion and fatigue detection, and there is space to improve the accuracy of recognition. In this paper, we propose a non-invasive and efficient detection method for driver fatigue and emotional state, which is the first time to combine them in the detection of driver state. Firstly, the captured video image sequences are preprocessed, and Dlib (image open source processing library) is used to locate face regions and mark key points; secondly, facial features are extracted, and fatigue indicators, such as driver eye closure time (PERCLOS) and yawn frequency are calculated using the dual-threshold method and fused by mathematical methods; thirdly, an improved lightweight RM-Xception convolutional neural network is introduced to identify the driver’s emotional state; finally, the two indicators are fused based on time series to obtain a comprehensive score for evaluating the driver’s state. The results show that the fatigue detection algorithm proposed in this paper has high accuracy, and the accuracy of the emotion recognition network reaches an accuracy rate of 73.32% on the Fer2013 dataset. The composite score calculated based on time series fusion can comprehensively and accurately reflect the driver state in different environments and make a contribution to future research in the field of assisted safe driving. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

15 pages, 8478 KiB  
Article
Construction of a Character Dataset for Historical Uchen Tibetan Documents under Low-Resource Conditions
by Ce Zhang, Weilan Wang and Guowei Zhang
Electronics 2022, 11(23), 3919; https://doi.org/10.3390/electronics11233919 - 27 Nov 2022
Viewed by 1077
Abstract
The construction of a character dataset is an important part of the research on document analysis and recognition of historical Tibetan documents. The results of character segmentation research in the previous stage are presented by coloring the characters with different color values. On [...] Read more.
The construction of a character dataset is an important part of the research on document analysis and recognition of historical Tibetan documents. The results of character segmentation research in the previous stage are presented by coloring the characters with different color values. On this basis, the characters are annotated, and the character images corresponding to the annotation are extracted to construct a character dataset. The construction of a character dataset is carried out as follows: (1) text annotation of segmented characters is performed; (2) the character image is extracted from the character block based on the real position information; (3) according to the class of annotated text, the extracted character images are classified to construct a preliminary character dataset; (4) data augmentation is used to solve the imbalance of classes and samples in the preliminary dataset; (5) research on character recognition based on the constructed dataset is performed. The experimental results show that under low-resource conditions, this paper solves the challenges in the construction of a historical Uchen Tibetan document character dataset and constructs a 610-class character dataset. This dataset lays the foundation for the character recognition of historical Tibetan documents and provides a reference for the construction of relevant document datasets. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

12 pages, 5695 KiB  
Article
Real-Time Detection of Mango Based on Improved YOLOv4
by Zhipeng Cao and Ruibo Yuan
Electronics 2022, 11(23), 3853; https://doi.org/10.3390/electronics11233853 - 23 Nov 2022
Cited by 6 | Viewed by 1705
Abstract
Agricultural mechanization occupies a key position in modern agriculture. Aiming at the fruit recognition target detection part of the picking robot, a mango recognition method based on an improved YOLOv4 network structure is proposed, which can quickly and accurately identify and locate mangoes. [...] Read more.
Agricultural mechanization occupies a key position in modern agriculture. Aiming at the fruit recognition target detection part of the picking robot, a mango recognition method based on an improved YOLOv4 network structure is proposed, which can quickly and accurately identify and locate mangoes. The method improves the recognition accuracy of the width adjustment network, then reduces the ResNet (Residual Networks) module to adjust the neck network to improve the prediction speed, and finally adds CBAM (Convolutional Block Attention Module) to improve the prediction accuracy of the network. The newly improved network model is YOLOv4-LightC-CBAM. The training results show that the mAP (mean Average Precision) obtained by YOLOV4-LightC-CBAM is 95.12%, which is 3.93% higher than YOLOv4. Regarding detection speed, YOLOV4-LightC-CBAM is up to 45.4 frames, which is 85.3% higher than YOLOv4. The results show that the modified network can recognize mangoes better, faster, and more accurately. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

18 pages, 8109 KiB  
Article
Separate Syntax and Semantics: Part-of-Speech-Guided Transformer for Image Captioning
by Dong Wang, Bing Liu, Yong Zhou, Mingming Liu, Peng Liu and Rui Yao
Appl. Sci. 2022, 12(23), 11875; https://doi.org/10.3390/app122311875 - 22 Nov 2022
Cited by 1 | Viewed by 1215
Abstract
Transformer-based image captioning models have recently achieved remarkable performance by using new fully attentive paradigms. However, existing models generally follow the conventional language model of predicting the next word conditioned on the visual features and partially generated words. They treat the predictions of [...] Read more.
Transformer-based image captioning models have recently achieved remarkable performance by using new fully attentive paradigms. However, existing models generally follow the conventional language model of predicting the next word conditioned on the visual features and partially generated words. They treat the predictions of visual and nonvisual words equally and usually tend to produce generic captions. To address these issues, we propose a novel part-of-speech-guided transformer (PoS-Transformer) framework for image captioning. Specifically, a self-attention part-of-speech prediction network is first presented to model the part-of-speech tag sequences for the corresponding image captions. Then, different attention mechanisms are constructed for the decoder to guide the caption generation by using the part-of-speech information. Benefiting from the part-of-speech guiding mechanisms, the proposed framework not only adaptively adjusts the weights between visual features and language signals for the word prediction, but also facilitates the generation of more fine-grained and grounded captions. Finally, a multitask learning is introduced to train the whole PoS-Transformer network in an end-to-end manner. Our model was trained and tested on the MSCOCO and Flickr30k datasets with the experimental evaluation standard CIDEr scores of 1.299 and 0.612, respectively. The qualitative experimental results indicated that the captions generated by our method conformed to the grammatical rules better. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

33 pages, 2318 KiB  
Review
A Review of Synthetic Image Data and Its Use in Computer Vision
by Keith Man and Javaan Chahl
J. Imaging 2022, 8(11), 310; https://doi.org/10.3390/jimaging8110310 - 21 Nov 2022
Cited by 14 | Viewed by 4407
Abstract
Development of computer vision algorithms using convolutional neural networks and deep learning has necessitated ever greater amounts of annotated and labelled data to produce high performance models. Large, public data sets have been instrumental in pushing forward computer vision by providing the data [...] Read more.
Development of computer vision algorithms using convolutional neural networks and deep learning has necessitated ever greater amounts of annotated and labelled data to produce high performance models. Large, public data sets have been instrumental in pushing forward computer vision by providing the data necessary for training. However, many computer vision applications cannot rely on general image data provided in the available public datasets to train models, instead requiring labelled image data that is not readily available in the public domain on a large scale. At the same time, acquiring such data from the real world can be difficult, costly to obtain, and manual labour intensive to label in large quantities. Because of this, synthetic image data has been pushed to the forefront as a potentially faster and cheaper alternative to collecting and annotating real data. This review provides general overview of types of synthetic image data, as categorised by synthesised output, common methods of synthesising different types of image data, existing applications and logical extensions, performance of synthetic image data in different applications and the associated difficulties in assessing data performance, and areas for further research. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

23 pages, 10254 KiB  
Article
Computer Vision-Based Approach for Automatic Detection of Dairy Cow Breed
by Himanshu Gupta, Parul Jindal, Om Prakash Verma, Raj Kumar Arya, Abdelhamied A. Ateya, Naglaa. F. Soliman and Vijay Mohan
Electronics 2022, 11(22), 3791; https://doi.org/10.3390/electronics11223791 - 18 Nov 2022
Cited by 5 | Viewed by 2336
Abstract
Purpose: Identification of individual cow breeds may offer various farming opportunities for disease detection, disease prevention and treatment, fertility and feeding, and welfare monitoring. However, due to the large population of cows with hundreds of breeds and almost identical visible appearance, their [...] Read more.
Purpose: Identification of individual cow breeds may offer various farming opportunities for disease detection, disease prevention and treatment, fertility and feeding, and welfare monitoring. However, due to the large population of cows with hundreds of breeds and almost identical visible appearance, their exact identification and detection become a tedious task. Therefore, the automatic detection of cow breeds would benefit the dairy industry. This study presents a computer-vision-based approach for identifying the breed of individual cattle. Methods: In this study, eight breeds of cows are considered to verify the classification process: Afrikaner, Brown Swiss, Gyr, Holstein Friesian, Limousin, Marchigiana, White Park, and Simmental cattle. A custom dataset is developed using web-mining techniques, comprising 1835 images grouped into 238, 223, 220, 212, 253, 185, 257, and 247 images for individual breeds. YOLOv4, a deep learning approach, is employed for breed classification and localization. The performance of the YOLOv4 algorithm is evaluated by training the model on different sets of training parameters. Results: Comprehensive analysis of the experimental results reveal that the proposed approach achieves an accuracy of 81.07%, with maximum kappa of 0.78 obtained at an image size of 608 × 608 and an intersection over union (IoU) threshold of 0.75 on the test dataset. Conclusions: The model performed better with YOLOv4 relative to other compared models. This places the proposed model among the top-ranked cow breed detection models. For future recommendations, it would be beneficial to incorporate simple tracking techniques between video frames to check the efficiency of this work. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

20 pages, 3987 KiB  
Article
Research on the Correlation Filter Tracking Model Based on the Deep-Pruned Feature Network
by Honglin Chen, Chunting Li and Chaomurilige
Appl. Sci. 2022, 12(22), 11490; https://doi.org/10.3390/app122211490 - 12 Nov 2022
Viewed by 1057
Abstract
Visual tracking is one of the key research fields in computer vision. Based on the combination of correlation filter tracking (CFT) model and deep convolutional neural networks (DCNNs), deep correlation filter tracking (DCFT) has recently become a critical issue in visual tracking because [...] Read more.
Visual tracking is one of the key research fields in computer vision. Based on the combination of correlation filter tracking (CFT) model and deep convolutional neural networks (DCNNs), deep correlation filter tracking (DCFT) has recently become a critical issue in visual tracking because of CFT’s rapidity and DCNN’s better feature representation. However, DCNNs are often complex in structure, which most possibly results in the conflict between the rapidity and accuracy of DCFT. To reduce such conflict, this paper proposes a model mainly including: (1) Based on the pre-pruning network obtained by feature channel importance, an optimal global tracking pruning rate (GTPR) is determined in terms of the contribution of filter channels to tracking response. (2) Based on (GTPR), an alternative convolutional kernel is defined to replace non-important channel kernels, which leads to the further pruning of the feature network. (3) An online updating pruned feature network with a structural similarity index is employed to adapt the model to tracking scene changes. (4) The proposed model was performed on OTB2013; experimental results demonstrate the model can effectively enhance speed with a 45% increment while guaranteeing tracking accuracy, and improve tracking accuracy with a 4% increment when tracking scene changes take place. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

21 pages, 5282 KiB  
Article
A Novel Separable Scheme for Encryption and Reversible Data Hiding
by Pei Chen, Yang Lei, Ke Niu and Xiaoyuan Yang
Electronics 2022, 11(21), 3505; https://doi.org/10.3390/electronics11213505 - 28 Oct 2022
Cited by 4 | Viewed by 1152
Abstract
With the increasing emphasis on security and privacy, video in the cloud sometimes needs to be stored and processed in an encrypted format. To facilitate the indexing and tampering detection of encrypted videos, data hiding is performed in encrypted videos. This paper proposes [...] Read more.
With the increasing emphasis on security and privacy, video in the cloud sometimes needs to be stored and processed in an encrypted format. To facilitate the indexing and tampering detection of encrypted videos, data hiding is performed in encrypted videos. This paper proposes a novel separable scheme for encryption and reversible data hiding. In terms of encryption method, intra-prediction mode and motion vector difference are encrypted by XOR encryption, and quantized discrete cosine transform block is permutated based on logistic chaotic mapping. In terms of the reversible data hiding algorithm, difference expansion is applied in encrypted video for the first time in this paper. The encryption method and the data hiding algorithm are separable, and the embedded information can be accurately extracted in both encrypted video bitstream and decrypted video bitstream. The experimental results show that the proposed encryption method can resist sketch attack and has higher security than other schemes, keeping the bit rate unchanged. The embedding algorithm used in the proposed scheme can provide higher capacity in the video with lower quantization parameter and good visual quality of the labeled decrypted video, maintaining low bit rate variation. The video encryption and the reversible data hiding are separable and the scheme can be applied in more scenarios. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

17 pages, 2858 KiB  
Article
Intracranial Hemorrhages Segmentation and Features Selection Applying Cuckoo Search Algorithm with Gated Recurrent Unit
by Jewel Sengupta and Robertas Alzbutas
Appl. Sci. 2022, 12(21), 10851; https://doi.org/10.3390/app122110851 - 26 Oct 2022
Cited by 9 | Viewed by 1491
Abstract
Generally, traumatic and aneurysmal brain injuries cause intracranial hemorrhages, which is a severe disease that results in death, if it is not treated and diagnosed properly at the early stage. Compared to other imaging techniques, Computed Tomography (CT) images are extensively utilized by [...] Read more.
Generally, traumatic and aneurysmal brain injuries cause intracranial hemorrhages, which is a severe disease that results in death, if it is not treated and diagnosed properly at the early stage. Compared to other imaging techniques, Computed Tomography (CT) images are extensively utilized by clinicians for locating and identifying intracranial hemorrhage regions. However, it is a time-consuming and complex task, which majorly depends on professional clinicians. To highlight this problem, a novel model is developed for the automatic detection of intracranial hemorrhages. After collecting the 3D CT scans from the Radiological Society of North America (RSNA) 2019 brain CT hemorrhage database, the image segmentation is carried out using Fuzzy C Means (FCM) clustering algorithm. Then, the hybrid feature extraction is accomplished on the segmented regions utilizing the Histogram of Oriented Gradients (HoG), Local Ternary Pattern (LTP), and Local Binary Pattern (LBP) to extract discriminative features. Furthermore, the Cuckoo Search Optimization (CSO) algorithm and the Optimized Gated Recurrent Unit (OGRU) classifier are integrated for feature selection and sub-type classification of intracranial hemorrhages. In the resulting segment, the proposed ORGU-CSO model obtained 99.36% of classification accuracy, which is higher related to other considered classifiers. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

18 pages, 7441 KiB  
Article
4-Band Multispectral Images Demosaicking Combining LMMSE and Adaptive Kernel Regression Methods
by Norbert Hounsou, Amadou T. Sanda Mahama and Pierre Gouton
J. Imaging 2022, 8(11), 295; https://doi.org/10.3390/jimaging8110295 - 25 Oct 2022
Viewed by 1557
Abstract
In recent years, multispectral imaging systems are considerably expanding with a variety of multispectral demosaicking algorithms. The most crucial task is setting up an optimal multispectral demosaicking algorithm in order to reconstruct the image with less error from the raw image of a [...] Read more.
In recent years, multispectral imaging systems are considerably expanding with a variety of multispectral demosaicking algorithms. The most crucial task is setting up an optimal multispectral demosaicking algorithm in order to reconstruct the image with less error from the raw image of a single sensor. In this paper, we presented a four-band multispectral filter array (MSFA) with the dominant blue band and a multispectral demosaicking algorithm that combines the linear minimum mean square error (LMMSE) and the adaptive kernel regression methods. To estimate the missing blue bands, we used the LMMSE algorithm and for the other spectral bands, the directional gradient method, which relies on the estimated blue bands. The adaptive kernel regression is then applied to each spectral band for their update without persistent artifacts. The experiment results demonstrate that our proposed method outperforms other existing approaches both visually and quantitatively in terms of peak signal-to-noise-ratio (PSNR), structural similarity index (SSIM) and root mean square error (RMSE). Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

17 pages, 14840 KiB  
Article
The Effect of Data Augmentation Methods on Pedestrian Object Detection
by Bokun Liu, Shaojing Su and Junyu Wei
Electronics 2022, 11(19), 3185; https://doi.org/10.3390/electronics11193185 - 04 Oct 2022
Cited by 3 | Viewed by 1928
Abstract
Night landscapes are a key area of monitoring and security as information in pictures caught on camera is not comprehensive. Data augmentation gives these limited datasets the most value. Considering night driving and dangerous events, it is important to achieve the better detection [...] Read more.
Night landscapes are a key area of monitoring and security as information in pictures caught on camera is not comprehensive. Data augmentation gives these limited datasets the most value. Considering night driving and dangerous events, it is important to achieve the better detection of people at night. This paper studies the impact of different data augmentation methods on target detection. For the image data collected at night under limited conditions, three different types of enhancement methods are used to verify whether they can promote pedestrian detection. This paper mainly explores supervised and unsupervised data augmentation methods with certain improvements, including multi-sample augmentation, unsupervised Generative Adversarial Network (GAN) augmentation and single-sample augmentation. It is concluded that the dataset obtained by the heterogeneous multi-sample augmentation method can optimize the target detection model, which can allow the mean average precision (mAP) of a night image to reach 0.76, and the improved Residual Convolutional GAN network, the unsupervised training model, can generate new samples with the same style, thus greatly expanding the dataset, so that the mean average precision reaches 0.854, and the single-sample enhancement of the deillumination can greatly improve the image clarity, helping improve the precision value by 0.116. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

16 pages, 1651 KiB  
Article
An Interference-Resistant and Low-Consumption Lip Recognition Method
by Junwei Jia, Zhilu Wang, Lianghui Xu, Jiajia Dai, Mingyi Gu and Jing Huang
Electronics 2022, 11(19), 3066; https://doi.org/10.3390/electronics11193066 - 26 Sep 2022
Cited by 1 | Viewed by 1187
Abstract
Lip movements contain essential linguistic information. It is an important medium for studying the content of the dialogue. At present, there are many studies on how to improve the accuracy of lip language recognition models. However, there are few studies on the robustness [...] Read more.
Lip movements contain essential linguistic information. It is an important medium for studying the content of the dialogue. At present, there are many studies on how to improve the accuracy of lip language recognition models. However, there are few studies on the robustness and generalization performance of the model under various disturbances. Specific experiments show that the current state-of-the-art lip recognition model significantly drops in accuracy when disturbed and is particularly sensitive to adversarial examples. This paper substantially alleviates this problem by using Mixup training. Taking the model subjected to negative attacks generated by FGSM as an example, the model in this paper achieves 85.0% and 40.2% accuracy on the English dataset LRW and the Mandarin dataset LRW-1000, respectively. The correct recognition rates are improved by 9.8% and 8.3%, compared with the current advanced lip recognition models. The positive impact of Mixup training on the robustness and generalization of lip recognition models is demonstrated. In addition, the performance of the lip recognition classification model depends more on the training parameters, which increase the computational cost. The InvNet-18 network in this paper reduces the consumption of GPU resources and the training time while improving the model accuracy. Compared with the standard ResNet-18 network used in mainstream lip recognition models, the InvNet-18 network in this paper has more than three times lower GPU consumption and 32% fewer parameters. After detailed analysis and comparison in various aspects, it is demonstrated that the model in this paper can effectively improve the model’s anti-interference ability and reduce training resource consumption. At the same time, the accuracy is comparable with the current state-of-the-art results. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

13 pages, 1421 KiB  
Article
Facial Action Unit Recognition by Prior and Adaptive Attention
by Zhiwen Shao, Yong Zhou, Hancheng Zhu, Wen-Liang Du, Rui Yao and Hao Chen
Electronics 2022, 11(19), 3047; https://doi.org/10.3390/electronics11193047 - 24 Sep 2022
Cited by 2 | Viewed by 1421
Abstract
Facial action unit (AU) recognition remains a challenging task, due to the subtlety and non-rigidity of AUs. A typical solution is to localize the correlated regions of each AU. Current works often predefine the region of interest (ROI) of each AU via prior [...] Read more.
Facial action unit (AU) recognition remains a challenging task, due to the subtlety and non-rigidity of AUs. A typical solution is to localize the correlated regions of each AU. Current works often predefine the region of interest (ROI) of each AU via prior knowledge, or try to capture the ROI only by the supervision of AU recognition during training. However, the predefinition often neglects important regions, while the supervision is insufficient to precisely localize ROIs. In this paper, we propose a novel AU recognition method by prior and adaptive attention. Specifically, we predefine a mask for each AU, in which the locations farther away from the AU centers specified by prior knowledge have lower weights. A learnable parameter is adopted to control the importance of different locations. Then, we element-wise multiply the mask by a learnable attention map, and use the new attention map to extract the AU-related feature, in which AU recognition can supervise the adaptive learning of a new attention map. Experimental results show that our method (i) outperforms the state-of-the-art AU recognition approaches on challenging benchmark datasets, and (ii) can accurately reason the regional attention distribution of each AU by combining the advantages of both the predefinition and the supervision. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

22 pages, 14878 KiB  
Article
DS6, Deformation-Aware Semi-Supervised Learning: Application to Small Vessel Segmentation with Noisy Training Data
by Soumick Chatterjee, Kartik Prabhu, Mahantesh Pattadkal, Gerda Bortsova, Chompunuch Sarasaen, Florian Dubost, Hendrik Mattern, Marleen de Bruijne, Oliver Speck and Andreas Nürnberger
J. Imaging 2022, 8(10), 259; https://doi.org/10.3390/jimaging8100259 - 22 Sep 2022
Cited by 5 | Viewed by 3609
Abstract
Blood vessels of the brain provide the human brain with the required nutrients and oxygen. As a vulnerable part of the cerebral blood supply, pathology of small vessels can cause serious problems such as Cerebral Small Vessel Diseases (CSVD). It has also been [...] Read more.
Blood vessels of the brain provide the human brain with the required nutrients and oxygen. As a vulnerable part of the cerebral blood supply, pathology of small vessels can cause serious problems such as Cerebral Small Vessel Diseases (CSVD). It has also been shown that CSVD is related to neurodegeneration, such as Alzheimer’s disease. With the advancement of 7 Tesla MRI systems, higher spatial image resolution can be achieved, enabling the depiction of very small vessels in the brain. Non-Deep Learning-based approaches for vessel segmentation, e.g., Frangi’s vessel enhancement with subsequent thresholding, are capable of segmenting medium to large vessels but often fail to segment small vessels. The sensitivity of these methods to small vessels can be increased by extensive parameter tuning or by manual corrections, albeit making them time-consuming, laborious, and not feasible for larger datasets. This paper proposes a deep learning architecture to automatically segment small vessels in 7 Tesla 3D Time-of-Flight (ToF) Magnetic Resonance Angiography (MRA) data. The algorithm was trained and evaluated on a small imperfect semi-automatically segmented dataset of only 11 subjects; using six for training, two for validation, and three for testing. The deep learning model based on U-Net Multi-Scale Supervision was trained using the training subset and was made equivariant to elastic deformations in a self-supervised manner using deformation-aware learning to improve the generalisation performance. The proposed technique was evaluated quantitatively and qualitatively against the test set and achieved a Dice score of 80.44 ± 0.83. Furthermore, the result of the proposed method was compared against a selected manually segmented region (62.07 resultant Dice) and has shown a considerable improvement (18.98%) with deformation-aware learning. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

15 pages, 3188 KiB  
Article
Attentive SOLO for Sonar Target Segmentation
by Honghe Huang, Zhen Zuo, Bei Sun, Peng Wu and Jiaju Zhang
Electronics 2022, 11(18), 2904; https://doi.org/10.3390/electronics11182904 - 13 Sep 2022
Viewed by 1409
Abstract
Imaging sonar systems play an important role in underwater target detection and location. Due to the influence of reverberation noise on imaging sonar systems, the task of sonar target segmentation is a challenging problem. In order to segment different types of targets in [...] Read more.
Imaging sonar systems play an important role in underwater target detection and location. Due to the influence of reverberation noise on imaging sonar systems, the task of sonar target segmentation is a challenging problem. In order to segment different types of targets in sonar images accurately, we proposed the gated fusion-pyramid segmentation attention (GF-PSA) module. Specifically, inspired by gated full fusion, we improved the pyramid segmentation attention (PSA) module by using gated fusion to reduce the noise interference during feature fusion and improve segmentation accuracy. Then, we improved the SOLOv2 (Segmenting Objects by Locations v2) algorithm with the proposed GF-PSA and named the improved algorithm Attentive SOLO. In addition, we constructed a sonar target segmentation dataset, named STSD, which contains 4000 real sonar images, covering eight object categories with a total of 7077 target annotations. The experimental results show that the segmentation accuracy of Attentive SOLO on STSD is as high as 74.1%, which is 3.7% higher than that of SOLOv2. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

14 pages, 4927 KiB  
Article
Privacy-Preserving Semantic Segmentation Using Vision Transformer
by Hitoshi Kiya, Teru Nagamori, Shoko Imaizumi and Sayaka Shiota
J. Imaging 2022, 8(9), 233; https://doi.org/10.3390/jimaging8090233 - 30 Aug 2022
Cited by 9 | Viewed by 2498
Abstract
In this paper, we propose a privacy-preserving semantic segmentation method that uses encrypted images and models with the vision transformer (ViT), called the segmentation transformer (SETR). The combined use of encrypted images and SETR allows us not only to apply images without sensitive [...] Read more.
In this paper, we propose a privacy-preserving semantic segmentation method that uses encrypted images and models with the vision transformer (ViT), called the segmentation transformer (SETR). The combined use of encrypted images and SETR allows us not only to apply images without sensitive visual information to SETR as query images but to also maintain the same accuracy as that of using plain images. Previously, privacy-preserving methods with encrypted images for deep neural networks have focused on image classification tasks. In addition, the conventional methods result in a lower accuracy than models trained with plain images due to the influence of image encryption. To overcome these issues, a novel method for privacy-preserving semantic segmentation is proposed by using an embedding that the ViT structure has for the first time. In experiments, the proposed privacy-preserving semantic segmentation was demonstrated to have the same accuracy as that of using plain images under the use of encrypted images. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

16 pages, 3047 KiB  
Article
Temporal Context Modeling Network with Local-Global Complementary Architecture for Temporal Proposal Generation
by Yunfeng Yuan, Wenzhu Yang, Zifei Luo and Ruru Gou
Electronics 2022, 11(17), 2674; https://doi.org/10.3390/electronics11172674 - 26 Aug 2022
Cited by 1 | Viewed by 1241
Abstract
Temporal Action Proposal Generation (TAPG) is a promising but challenging task with a wide range of practical applications. Although state-of-the-art methods have made significant progress in TAPG, most ignore the impact of the temporal scales of action and lack the exploitation of effective [...] Read more.
Temporal Action Proposal Generation (TAPG) is a promising but challenging task with a wide range of practical applications. Although state-of-the-art methods have made significant progress in TAPG, most ignore the impact of the temporal scales of action and lack the exploitation of effective boundary contexts. In this paper, we propose a simple but effective unified framework named Temporal Context Modeling Network (TCMNet) that generates temporal action proposals. TCMNet innovatively uses convolutional filters with different dilation rates to address the temporal scale issue. Specifically, TCMNet contains a BaseNet with dilated convolutions (DBNet), an Action Completeness Module (ACM), and a Temporal Boundary Generator (TBG). The DBNet aims to model temporal information. It handles input video features through different dilated convolutional layers and outputs a feature sequence as the input of ACM and TBG. The ACM aims to evaluate the confidence scores of densely distributed proposals. The TBG is designed to enrich the boundary context of an action instance. The TBG can generate action boundaries with high precision and high recall through a local–global complementary structure. We conduct comprehensive evaluations on two challenging video benchmarks: ActivityNet-1.3 and THUMOS14. Extensive experiments demonstrate the effectiveness of the proposed TCMNet on tasks of temporal action proposal generation and temporal action detection. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

13 pages, 4092 KiB  
Article
Color Point Defect Detection Method Based on Color Salient Features
by Zhixi Wang, Wenqiang Xie, Huaixin Chen, Biyuan Liu and Lingyu Shuai
Electronics 2022, 11(17), 2665; https://doi.org/10.3390/electronics11172665 - 25 Aug 2022
Cited by 1 | Viewed by 1710
Abstract
Display color point defect detection is an important link in the display quality inspection process. To improve the detection accuracy of color point defects, a color point defect detection method based on color salient features is proposed. Color point defects that conform to [...] Read more.
Display color point defect detection is an important link in the display quality inspection process. To improve the detection accuracy of color point defects, a color point defect detection method based on color salient features is proposed. Color point defects that conform to the perception of the human vision are used as the key point for detection. First, the human visual perception constraint coefficient is used to correct the RGB three-channel image to obtain the color-channel-transformed image. Then, the local contrast method is used to extract the point features of the color channel, which achieves point defect enhancement, noise and background suppression. Finally, the mean and standard deviation of the defect feature maps of R, G, and B channels are calculated. The maximum mean and standard deviation are selected as thresholds using the maximum fusion criterion to perform binarization segmentation of the defect feature maps of R, G, and B channels. An OR operation was performed on the segmented images and the point defect segmentation results were combined. The experimental results show that the average detection accuracy and recall of the algorithm is higher than 94%, which is a significant improvement compared with mainstream detection methods and meets the needs of industrial production. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

14 pages, 5255 KiB  
Article
Multiple Mechanisms to Strengthen the Ability of YOLOv5s for Real-Time Identification of Vehicle Type
by Qiang Luo, Junfan Wang, Mingyu Gao, Zhiwei He, Yuxiang Yang and Hongtao Zhou
Electronics 2022, 11(16), 2586; https://doi.org/10.3390/electronics11162586 - 18 Aug 2022
Cited by 7 | Viewed by 1688
Abstract
Identifying the type of vehicle on the road is a challenging task, especially in the natural environment with all its complexities, such that the traditional architecture for object detection requires an excessively large amount of computation. Such lightweight networks as MobileNet are fast [...] Read more.
Identifying the type of vehicle on the road is a challenging task, especially in the natural environment with all its complexities, such that the traditional architecture for object detection requires an excessively large amount of computation. Such lightweight networks as MobileNet are fast but cannot satisfy the performance-related requirements of this task. Improving the detection-related performance of small networks is, thus, an outstanding challenge. In this paper, we use YOLOv5s as the backbone network to propose a large-scale convolutional fusion module called the ghost cross-stage partial network (G_CSP), which can integrate large-scale information from different feature maps to identify vehicles on the road. We use the convolutional triplet attention network (C_TA) module to extract attention-based information from different dimensions. We also optimize the original spatial pyramid pooling fast (SPPF) module and use the dilated convolution to increase the capability of the network to extract information. The optimized module is called the DSPPF. The results of extensive experiments on the bdd100K, VOC2012 + 2007, and VOC2019 datasets showed that the improved YOLOv5s network performs well and can be used on mobile devices in real time. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

21 pages, 8000 KiB  
Article
A Multi-Domain Embedding Framework for Robust Reversible Data Hiding Scheme in Encrypted Videos
by Pei Chen, Zhuo Zhang, Yang Lei, Ke Niu and Xiaoyuan Yang
Electronics 2022, 11(16), 2552; https://doi.org/10.3390/electronics11162552 - 15 Aug 2022
Cited by 1 | Viewed by 1349
Abstract
For easier cloud management, reversible data hiding is performed in an encrypted domain to embed label information. However, the existing schemes are not robust and may cause the loss of label information during transmission. Enhancing robustness while maintaining reversibility in data hiding is [...] Read more.
For easier cloud management, reversible data hiding is performed in an encrypted domain to embed label information. However, the existing schemes are not robust and may cause the loss of label information during transmission. Enhancing robustness while maintaining reversibility in data hiding is a challenge. In this paper, a multi-domain embedding framework in encrypted videos is proposed to achieve both robustness and reversibility. In the framework, the multi-domain characteristic of encrypted video is fully used. The element for robust embedding is encrypted through Logistic chaotic scrambling, which is marked as element-I. To further improve robustness, the label information will be encoded with the Bose–Chaudhuri–Hocquenghem code. Then, the label information will be robustly embedded into element-I by modulating the amplitude of element-I, in which the auxiliary information is generated for lossless recovery of the element-I. The element for reversible embedding is marked as element-II, the sign of which will be encrypted by stream cipher. The auxiliary information will be reversibly embedded into element-Ⅱ through traditional histogram shifting. To verity the feasibility of the framework, an anti-recompression RDH-EV based on the framework is proposed. The experimental results show that the proposed scheme outperforms the current representative ones in terms of robustness, while achieving reversibility. In the proposed scheme, video encryption and data hiding are commutative and the original video bitstream can be recovered fully. These demonstrate the feasibility of the multi-domain embedding framework in encrypted videos. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

19 pages, 4722 KiB  
Article
Small Sample Hyperspectral Image Classification Method Based on Dual-Channel Spectral Enhancement Network
by Songwei Pei, Hong Song and Yinning Lu
Electronics 2022, 11(16), 2540; https://doi.org/10.3390/electronics11162540 - 13 Aug 2022
Cited by 6 | Viewed by 1944
Abstract
Deep learning has achieved significant success in the field of hyperspectral image (HSI) classification, but challenges are still faced when the number of training samples is small. Feature fusing approaches based on multi-channel and multi-scale feature extractions are attractive for HSI classification where [...] Read more.
Deep learning has achieved significant success in the field of hyperspectral image (HSI) classification, but challenges are still faced when the number of training samples is small. Feature fusing approaches based on multi-channel and multi-scale feature extractions are attractive for HSI classification where few samples are available. In this paper, based on feature fusion, we proposed a simple yet effective CNN-based Dual-channel Spectral Enhancement Network (DSEN) to fully exploit the features of the small labeled HSI samples for HSI classification. We worked with the observation that, in many HSI classification models, most of the incorrectly classified pixels of HSI are at the border of different classes, which is caused by feature obfuscation. Hence, in DSEN, we specially designed a spectral feature extraction channel to enhance the spectral feature representation of the specific pixel. Moreover, a spatial–spectral channel was designed using small convolution kernels to extract the spatial–spectral features of HSI. By adjusting the fusion proportion of the features extracted from the two channels, the expression of spectral features was enhanced in terms of the fused features for better HSI classification. The experimental results demonstrated that the overall accuracy (OA) of HSI classification using the proposed DSEN reached 69.47%, 80.54%, and 93.24% when only five training samples for each class were selected from the Indian Pines (IP), University of Pavia (UP), and Salinas Scene (SA) datasets, respectively. The performance improved when the number of training samples increased. Compared with several related methods, DSEN demonstrated superior performance in HSI classification. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

19 pages, 3941 KiB  
Article
Geo-Location Method for Images of Damaged Roads
by Wenbo Zhang, Jue Qu, Wei Wang, Jun Hu and Jie Li
Electronics 2022, 11(16), 2530; https://doi.org/10.3390/electronics11162530 - 12 Aug 2022
Viewed by 1331
Abstract
Due to the large difference between normal conditions and damaged road images, geo-location in damaged areas often fails due to occlusion or damage to buildings and iconic signage in the image. In order to study the influence of post-war building and landmark damage [...] Read more.
Due to the large difference between normal conditions and damaged road images, geo-location in damaged areas often fails due to occlusion or damage to buildings and iconic signage in the image. In order to study the influence of post-war building and landmark damage conditions on the geolocation results of localization algorithms, and to improve the geolocation effect of such algorithms under damaged conditions, this paper used informative reference images and key point selection. Aiming at the negative effects of occlusion and landmark building damage in the retrieval process, a retrieval method called reliability- and repeatability-based deep learning feature points is proposed. In order to verify the effectiveness of the above algorithm, this paper constructed a data set consisting of urban, rural and technological parks and other road segments as a training set to generate a database. It consists of 11,896 reference images. Considering the cost of damaged landmarks, an artificially generated method is used to construct images of damaged landmarks with different damage ratios as a test set. Experiments show that the database optimization method can effectively compress the storage capacity of the feature index and can also speed up the positioning speed without affecting the accuracy rate. The proposed image retrieval method optimizes feature points and feature indices to make them reliable against damaged terrain and images. The improved algorithm improved the accuracy of geo-location for damaged roads, and the method based on deep learning has a higher effect on the geo-location of damaged roads than the traditional algorithm. Furthermore, we fully demonstrated the effectiveness of our proposed method by constructing a multi-segment road image dataset. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

19 pages, 5249 KiB  
Article
Dual-Anchor Metric Learning for Blind Image Quality Assessment of Screen Content Images
by Weiyi Jing, Yongqiang Bai, Zhongjie Zhu, Rong Zhang and Yiwen Jin
Electronics 2022, 11(16), 2510; https://doi.org/10.3390/electronics11162510 - 11 Aug 2022
Cited by 1 | Viewed by 1383
Abstract
The natural scene statistic is destroyed by the artificial portion in the screen content images (SCIs) and is also impractical for obtaining an accurate statistical model due to the variable composition of the artificial and natural parts in SCIs. To resolve this problem, [...] Read more.
The natural scene statistic is destroyed by the artificial portion in the screen content images (SCIs) and is also impractical for obtaining an accurate statistical model due to the variable composition of the artificial and natural parts in SCIs. To resolve this problem, this paper presents a dual-anchor metric learning (DAML) method that is inspired by metric learning to obtain discriminative statistical features and further identify complex distortions, as well as predict SCI image quality. First, two Gaussian mixed models with prior data are constructed as the target anchors of the statistical model from natural and artificial image databases, which can effectively enhance the metrical discrimination of the mapping relation between the feature representation and quality degradation by conditional probability analysis. Then, the distances of the high-order statistics are softly aggregated to conduct metric learning between the local features and clusters of each target statistical model. Through empirical analysis and experimental verification, only variance differences are used as quality-aware features to benefit the balance of complexity and effectiveness. Finally, the mapping model between the target distances and subjective quality can be obtained by support vector regression. To validate the performance of DAML, multiple experiments are carried out on three public databases: SIQAD, SCD, and SCID. Meanwhile, PLCC, SRCC, and the RMSE are then employed to compute the correlation between subjective and objective ratings, which can estimate the prediction of accuracy, monotonicity, and consistency, respectively. The PLCC and RMSE of the method achieved 0.9136 and 0.7993. The results confirm the good performance of the proposed method. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

13 pages, 2414 KiB  
Article
Dim and Small Target Tracking Using an Improved Particle Filter Based on Adaptive Feature Fusion
by Youhui Huo, Yaohong Chen, Hongbo Zhang, Haifeng Zhang and Hao Wang
Electronics 2022, 11(15), 2457; https://doi.org/10.3390/electronics11152457 - 07 Aug 2022
Cited by 4 | Viewed by 1566
Abstract
Particle filters have been widely used in dim and small target tracking, which plays a significant role in navigation applications. However, their characteristics, such as difficulty of expressing features for dim and small targets and lack of particle diversity caused by resampling, lead [...] Read more.
Particle filters have been widely used in dim and small target tracking, which plays a significant role in navigation applications. However, their characteristics, such as difficulty of expressing features for dim and small targets and lack of particle diversity caused by resampling, lead to a considerable negative impact on tracking performance. In the present paper, we propose an improved resampling particle filter algorithm based on adaptive multi-feature fusion to address the drawbacks of particle filters for dim and small target tracking and improve the tracking performance. We first establish an observation model based on the adaptive fusion of the features of the weighted grayscale intensity, edge information, and wavelet transform. We then generate new particles based on residual resampling by combining the target position in the previous frame and the particles in the current frame with higher weights, with the tracking accuracy and particle diversity improving simultaneously. The experimental results demonstrate that our proposed method achieves a high tracking performance with a distance accuracy of 77.2% and a running speed of 106 fps, respectively, meaning that it will have a promising prospect in dim and small target tracking applications. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

18 pages, 7476 KiB  
Article
Cyclic Learning-Based Lightweight Network for Inverse Tone Mapping
by Jiyun Park and Byung Cheol Song
Electronics 2022, 11(15), 2436; https://doi.org/10.3390/electronics11152436 - 04 Aug 2022
Cited by 1 | Viewed by 1400
Abstract
Recent studies on inverse tone mapping (iTM) have moved toward indirect mapping, which generates a stack of low dynamic range (LDR) images with multiple exposure values (multi-EV stack) and then merges them. In order to generate multi-EV stack(s), several large-scale networks with more [...] Read more.
Recent studies on inverse tone mapping (iTM) have moved toward indirect mapping, which generates a stack of low dynamic range (LDR) images with multiple exposure values (multi-EV stack) and then merges them. In order to generate multi-EV stack(s), several large-scale networks with more than 20 M parameters have been proposed, but their high dynamic range (HDR) reconstruction and multi-EV stack generation performance were not acceptable. Also, some previous methods using cycle consistency should even have trained additional networks that are not used for multi-EV stack generation, which results in large memory for training. Thus, this paper proposes novel cyclic learning based on cycle consistency to reduce the memory burden in training. In detail, we eliminated networks used only for training, so the proposed method enables efficient learning in terms of training-purpose memory. In addition, this paper presents a lightweight iTM network that dramatically reduces the network sizes of the existing networks. Actually, the proposed lightweight network requires only a small parameter size of 1/100 compared to the state-of-the-art (SOTA) method. The lightweight network contributes to the practical use of iTM. Therefore, the proposed method based on a lightweight network reliably generates a multi-EV stack. Experimental results show that the proposed method achieves quantitatively SOTA performance and is qualitatively comparable to conventional indirect iTM methods. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

18 pages, 20643 KiB  
Article
A ConvNext-Based and Feature Enhancement Anchor-Free Siamese Network for Visual Tracking
by Qiguo Xu, Honggui Deng, Zeyu Zhang, Yang Liu, Xusheng Ruan and Gang Liu
Electronics 2022, 11(15), 2381; https://doi.org/10.3390/electronics11152381 - 29 Jul 2022
Cited by 2 | Viewed by 2207
Abstract
Existing anchor-based Siamese trackers rely on the anchor’s design to predict the scale and aspect ratio of the target. However, these methods introduce many hyperparameters, leading to computational redundancy. In this paper, to achieve outstanding network efficiency, we propose a ConvNext-based anchor-free Siamese [...] Read more.
Existing anchor-based Siamese trackers rely on the anchor’s design to predict the scale and aspect ratio of the target. However, these methods introduce many hyperparameters, leading to computational redundancy. In this paper, to achieve outstanding network efficiency, we propose a ConvNext-based anchor-free Siamese tracking network (CAFSN), which employs an anchor-free design to increase network flexibility and versatility. In CAFSN, to obtain an appropriate backbone network, the state-of-the-art ConvNext network is applied to the visual tracking field for the first time by improving the network stride and receptive field. Moreover, A central confidence branch based on Euclidean distance is offered to suppress low-quality prediction frames in the classification prediction network of CAFSN for robust visual tracking. In particular, we discuss that the Siamese network cannot establish a complete identification model for the tracking target and similar objects, which negatively impacts network performance. We build a Fusion network consisting of crop and 3Dmaxpooling to better distinguish the targets and similar objects’ abilities. This module uses 3DMaxpooling to select the highest activation value to improve the difference between it and other similar objects. Crop unifies the dimensions of different features and reduces the amount of computation. Ablation experiments demonstrate that this module increased success rates by 1.7% and precision by 0.5%. We evaluate CAFSN on challenging benchmarks such as OTB100, UAV123, and GOT-10K, validating advanced performance in noise immunity and similar target identification with 58.44 FPS in real time. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Graphical abstract

16 pages, 4055 KiB  
Article
Automatic Classification of Pollen Grain Microscope Images Using a Multi-Scale Classifier with SRGAN Deblurring
by Xingyu Chen and Fujiao Ju
Appl. Sci. 2022, 12(14), 7126; https://doi.org/10.3390/app12147126 - 14 Jul 2022
Cited by 2 | Viewed by 2420
Abstract
Pollen allergies are seasonal epidemic diseases that are accompanied by high incidence rates, especially in Beijing, China. With the development of deep learning, key progress has been made in the task of automatic pollen grain classification, which could replace the time-consuming and laborious [...] Read more.
Pollen allergies are seasonal epidemic diseases that are accompanied by high incidence rates, especially in Beijing, China. With the development of deep learning, key progress has been made in the task of automatic pollen grain classification, which could replace the time-consuming and laborious manual identification process using a microscope. In China, few pioneering works have made significant progress in automatic pollen grain classification. Therefore, we first constructed a multi-class and large-scale pollen grain dataset for the Beijing area in preparation for the task of pollen classification. Then, a deblurring pipeline was designed to enhance the quality of the pollen grain images selectively. Moreover, as pollen grains vary greatly in size and shape, we proposed an easy-to-implement and efficient multi-scale deep learning architecture. Our experimental results showed that our architecture achieved a 97.7% accuracy, based on the Resnet-50 backbone network, which proved that the proposed method could be applied successfully to the automatic identification of pollen grains in Beijing. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

23 pages, 16823 KiB  
Article
A Codec-Unified Deblurring Approach Based on U-Shaped Invertible Network with Sparse Salient Representation in Latent Space
by Meng Wang, Tao Wen and Haipeng Liu
Electronics 2022, 11(14), 2177; https://doi.org/10.3390/electronics11142177 - 12 Jul 2022
Viewed by 1498
Abstract
Existing deep learning architectures usually use a separate encoder and decoder to generate the desired simulated images, which is inefficient for feature analysis and synthesis. Aiming at the problem that the existing methods fail to fully utilize the correlation of codecs, this paper [...] Read more.
Existing deep learning architectures usually use a separate encoder and decoder to generate the desired simulated images, which is inefficient for feature analysis and synthesis. Aiming at the problem that the existing methods fail to fully utilize the correlation of codecs, this paper focuses on the codec-unified invertible networks to accurately guide the image deblurring process by controlling latent variables. Inspired by U-Net, a U-shaped multi-level invertible network (UML-IN) is proposed by integrating the wavelet invertible networks into a supervised U-shape architecture to establish the multi-resolution correlation between blurry and sharp image features under the guidance of hybrid loss. Further, this paper proposes to use L1 regularization constraints to obtain sparse latent variables, thereby alleviating the information dispersion problem caused by high-dimensional inference in invertible networks. Finally, we fine-tune the weights of invertible modules by calculating a similarity loss between blur-sharp variable pairs. Extensive experiments on real and synthetic blurry sets show that the proposed approach is efficient and competitive compared with the state-of-the-art methods. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

9 pages, 2800 KiB  
Article
An Edge Detection Method Based on Local Gradient Estimation: Application to High-Temperature Metallic Droplet Images
by Ranya Al Darwich, Laurent Babout and Krzysztof Strzecha
Appl. Sci. 2022, 12(14), 6976; https://doi.org/10.3390/app12146976 - 09 Jul 2022
Cited by 3 | Viewed by 1923
Abstract
Edge detection is a fundamental step in many computer vision systems, particularly in image segmentation and feature detection. There are a lot of algorithms for detecting edges of objects in images. This paper proposes a method based on local gradient estimation to detect [...] Read more.
Edge detection is a fundamental step in many computer vision systems, particularly in image segmentation and feature detection. There are a lot of algorithms for detecting edges of objects in images. This paper proposes a method based on local gradient estimation to detect metallic droplet image edges and compare the results to a contour line obtained from the active contour model of the same images, and to results from crowdsourcing to identify droplet edges at specific points. The studied images were taken at high temperatures, which makes the segmentation process particularly difficult. The comparison between the three methods shows that the proposed method is more accurate than the active contour method, especially at the point of contact between the droplet and the base. It is also shown that the reliability of the data from the crowdsourcing is as good as the edge points obtained from the local gradient estimation method. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

16 pages, 6734 KiB  
Article
Computer Vision System: Measuring Displacement and Bending Angle of Ionic Polymer-Metal Composites
by Eyman Manaf, Karol Fitzgerald, Clement L. Higginbotham and John G. Lyons
Appl. Sci. 2022, 12(13), 6744; https://doi.org/10.3390/app12136744 - 03 Jul 2022
Cited by 2 | Viewed by 2570
Abstract
A computer vision system for measuring the displacement and bending angle of ionic polymer–metal composites (IPMC) was proposed in this study. The logical progression of measuring IPMC displacement and bending angle was laid out. This study used Python (version 3.10) in conjunction with [...] Read more.
A computer vision system for measuring the displacement and bending angle of ionic polymer–metal composites (IPMC) was proposed in this study. The logical progression of measuring IPMC displacement and bending angle was laid out. This study used Python (version 3.10) in conjunction with OpenCV (version 4.5.5.64) for the development of the vision system. The coding functions and the mathematical formulas used were elaborated on. IPMC contour detection was discussed in detail, along with appropriate camera and lighting setups. Measurements generated from the vision system were compared to approximated values via a manual calculation method. Good agreement was found between the results produced by the two methods. The mean absolute error (MAE) and root mean squared error (RMSE) for the displacement values are 0.068080668 and 0.088160652, respectively, and 0.081544205 and 0.103880163, respectively, for the bending angle values. The proposed vision system can accurately approximate the displacement and bending angle of IPMCs. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

19 pages, 10053 KiB  
Article
An Improved Method for Evaluating Image Sharpness Based on Edge Information
by Zhaoyang Liu, Huajie Hong, Zihao Gan, Jianhua Wang and Yaping Chen
Appl. Sci. 2022, 12(13), 6712; https://doi.org/10.3390/app12136712 - 02 Jul 2022
Cited by 4 | Viewed by 2594
Abstract
In order to improve the subjective and objective consistency of image sharpness evaluation while meeting the requirement of image content irrelevance, this paper proposes an improved sharpness evaluation method without a reference image. First, the positions of the edge points are obtained by [...] Read more.
In order to improve the subjective and objective consistency of image sharpness evaluation while meeting the requirement of image content irrelevance, this paper proposes an improved sharpness evaluation method without a reference image. First, the positions of the edge points are obtained by a Canny edge detection algorithm based on the activation mechanism. Then, the edge direction detection algorithm based on the grayscale information of the eight neighboring pixels is used to acquire the edge direction of each edge point. Further, the edge width is solved to establish the histogram of edge width. Finally, according to the performance of three distance factors based on the histogram information, the type 3 distance factor is introduced into the weighted average edge width solving model to obtain the sharpness evaluation index. The image sharpness evaluation method proposed in this paper was tested on the LIVE database. The test results were as follows: the Pearson linear correlation coefficient (CC) was 0.9346, the root mean square error (RMSE) was 5.78, the mean absolute error (MAE) was 4.9383, the Spearman rank-order correlation coefficient (ROCC) was 0.9373, and the outlier rate (OR) as 0. In addition, through a comparative analysis with two other methods and a real shooting experiment, the superiority and effectiveness of the proposed method in performance were verified. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

27 pages, 129857 KiB  
Article
Three-Dimensional Object Segmentation and Labeling Algorithm Using Contour and Distance Information
by Wen-Chien Lo, Chung-Cheng Chiu and Jia-Horng Yang
Appl. Sci. 2022, 12(13), 6602; https://doi.org/10.3390/app12136602 - 29 Jun 2022
Cited by 1 | Viewed by 1472
Abstract
Object segmentation and object labeling are important techniques in the field of image processing. Because object segmentation techniques developed using two-dimensional images may cause segmentation errors for overlapping objects, this paper proposes a three-dimensional object segmentation and labeling algorithm that combines the segmentation [...] Read more.
Object segmentation and object labeling are important techniques in the field of image processing. Because object segmentation techniques developed using two-dimensional images may cause segmentation errors for overlapping objects, this paper proposes a three-dimensional object segmentation and labeling algorithm that combines the segmentation and labeling functions using contour and distance information for static images. The proposed algorithm can segment and label the object without relying on the dynamic information of consecutive images and without obtaining the characteristics of the segmented objects in advance. The algorithm can also effectively segment and label complex overlapping objects and estimate the object’s distance and size according to the labeling contour information. In this paper, a self-made image capture system is developed to capture test images and the actual distance and size of the objects are also measured using measuring tools. The measured data is used as a reference for the estimated data of the proposed algorithm. The experimental results show that the proposed algorithm can effectively segment and label the complex overlapping objects, obtain the estimated distance and size of each object, and satisfy the detection requirements of objects at a long-range in outdoor scenes. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

17 pages, 4782 KiB  
Article
Detection of Dense Citrus Fruits by Combining Coordinated Attention and Cross-Scale Connection with Weighted Feature Fusion
by Xiaoyu Liu, Guo Li, Wenkang Chen, Binghao Liu, Ming Chen and Shenglian Lu
Appl. Sci. 2022, 12(13), 6600; https://doi.org/10.3390/app12136600 - 29 Jun 2022
Cited by 12 | Viewed by 2173
Abstract
The accuracy detection of individual citrus fruits in a citrus orchard environments is one of the key steps in realizing precision agriculture applications such as yield estimation, fruit thinning, and mechanical harvesting. This study proposes an improved object detection YOLOv5 model to achieve [...] Read more.
The accuracy detection of individual citrus fruits in a citrus orchard environments is one of the key steps in realizing precision agriculture applications such as yield estimation, fruit thinning, and mechanical harvesting. This study proposes an improved object detection YOLOv5 model to achieve accurate the identification and counting of citrus fruits in an orchard environment. First, the latest visual attention mechanism coordinated attention module (CA) was inserted into an improved backbone network to focus on fruit-dense regions to recognize small target fruits. Second, an efficient two-way cross-scale connection and weighted feature fusion BiFPN in the neck network were used to replace the PANet multiscale feature fusion network, giving effective feature corresponding weights to fully fuse the high-level and bottom-level features. Finally, the varifocal loss function was used to calculate the model loss for better model training results. The results of the experiments on four varieties of citrus trees showed that our improved model proposed to this study could effectively identify dense small citrus fruits. Specifically, the recognized AP (average precision) reached 98.4%, and the average recognition time was 0.019 s per image. Compared with the original YOLOv5 (including deferent variants of n, s, m, l, and x), the increase in the average accuracy precision of the improved YOLOv5 ranged from 7.5% to 0.8% while maintaining similar average inference time. Four different citrus varieties were also tested to evaluate the generalization performance of the improved model. The method can be further used as a part in a vision system to provide technical support for the real-time and accurate detection of multiple fruit targets during mechanical picking in citrus orchards. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

20 pages, 5254 KiB  
Article
An Endoscope Image Enhancement Algorithm Based on Image Decomposition
by Wei Tan, Chao Xu, Fang Lei, Qianqian Fang, Ziheng An, Dou Wang, Jubao Han, Kai Qian and Bo Feng
Electronics 2022, 11(12), 1909; https://doi.org/10.3390/electronics11121909 - 19 Jun 2022
Cited by 3 | Viewed by 2587
Abstract
The visual quality of endoscopic images is a significant factor in early lesion inspection and surgical procedures. However, due to the interference of light sources, hardware, and other configurations, the endoscopic images collected clinically have uneven illumination, blurred details, and contrast. This paper [...] Read more.
The visual quality of endoscopic images is a significant factor in early lesion inspection and surgical procedures. However, due to the interference of light sources, hardware, and other configurations, the endoscopic images collected clinically have uneven illumination, blurred details, and contrast. This paper proposed a new endoscopic image enhancement algorithm. The image decomposes into a detail layer and a base layer based on noise suppression. The blood vessel information is stretched by channel in the detail layer, and adaptive brightness correction is performed in the base layer. Finally, Fusion obtained a new endoscopic image. This paper compares the algorithm with six other algorithms in the laboratory dataset. The algorithm is in the leading position in all five objective evaluation metrics, further indicating that the algorithm is ahead of other algorithms in contrast, structural similarity, and peak signal-to-noise ratio. It can effectively highlight the blood vessel information in endoscopic images while avoiding the influence of noise and highlight points. The proposed algorithm can well solve the existing problems of endoscopic images. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

16 pages, 3068 KiB  
Article
CDTNet: Improved Image Classification Method Using Standard, Dilated and Transposed Convolutions
by Yuepeng Zhou, Huiyou Chang, Yonghe Lu and Xili Lu
Appl. Sci. 2022, 12(12), 5984; https://doi.org/10.3390/app12125984 - 12 Jun 2022
Cited by 7 | Viewed by 2088
Abstract
Convolutional neural networks (CNNs) have achieved great success in image classification tasks. In the process of a convolutional operation, a larger input area can capture more context information. Stacking several convolutional layers can enlarge the receptive field, but this increases the parameters. Most [...] Read more.
Convolutional neural networks (CNNs) have achieved great success in image classification tasks. In the process of a convolutional operation, a larger input area can capture more context information. Stacking several convolutional layers can enlarge the receptive field, but this increases the parameters. Most CNN models use pooling layers to extract important features, but the pooling operations cause information loss. Transposed convolution can increase the spatial size of the feature maps to recover the lost low-resolution information. In this study, we used two branches with different dilated rates to obtain different size features. The dilated convolution can capture richer information, and the outputs from the two channels are concatenated together as input for the next block. The small size feature maps of the top blocks are transposed to increase the spatial size of the feature maps to recover low-resolution prediction maps. We evaluated the model on three image classification benchmark datasets (CIFAR-10, SVHN, and FMNIST) with four state-of-the-art models, namely, VGG16, VGG19, ResNeXt, and DenseNet. The experimental results show that CDTNet achieved lower loss, higher accuracy, and faster convergence speed in the training and test stages. The average test accuracy of CDTNet increased by 54.81% at most on SVHN with VGG19 and by 1.28% at least on FMNIST with VGG16, which proves that CDTNet has better performance and strong generalization abilities, as well as fewer parameters. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

14 pages, 3690 KiB  
Article
Greengage Grading Method Based on Dynamic Feature and Ensemble Networks
by Keqiong Chen, Weitao Li, Jiaxi An and Tianrui Bu
Electronics 2022, 11(12), 1832; https://doi.org/10.3390/electronics11121832 - 09 Jun 2022
Viewed by 1371
Abstract
To overcome the deficiencies of the traditional open-loop cognition method, which lacks evaluation of the cognitive results, a novel cognitive method for greengage grading based on dynamic feature and ensemble networks is explored in this paper. First, a greengage grading architecture with an [...] Read more.
To overcome the deficiencies of the traditional open-loop cognition method, which lacks evaluation of the cognitive results, a novel cognitive method for greengage grading based on dynamic feature and ensemble networks is explored in this paper. First, a greengage grading architecture with an adaptive feedback mechanism based on error adjustment is constructed to imitate the human cognitive mechanism. Secondly, a dynamic representation model for convolutional feature space construction of a greengage image is established based on the entropy constraint indicators, and the bagging classification network for greengage grading is built based on stochastic configuration networks (SCNs) to realize a hierarchical representation of the greengage features and enhance the generalization of the classifier. Thirdly, an entropy-based error model of the cognitive results for greengage grading is constructed to describe the optimal cognitive problem from an information perspective, and then the criteria and mechanism for feature level and feature efficiency regulation are given out within the constraint of cognitive error entropy. Finally, numerous experiments are performed on the collected greengage images. The experimental results demonstrate the effectiveness and superiority of our method, especially for the classification of similar samples, compared with the existing open-loop algorithms. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

23 pages, 10263 KiB  
Article
Three-Dimensional Reconstruction Method for Bionic Compound-Eye System Based on MVSNet Network
by Xinpeng Deng, Su Qiu, Weiqi Jin and Jiaan Xue
Electronics 2022, 11(11), 1790; https://doi.org/10.3390/electronics11111790 - 05 Jun 2022
Cited by 4 | Viewed by 1806
Abstract
In practical scenarios, when shooting conditions are limited, high efficiency of image shooting and success rate of 3D reconstruction are required. To achieve the application of bionic compound eyes in small portable devices for 3D reconstruction, auto-navigation, and obstacle avoidance, a deep learning [...] Read more.
In practical scenarios, when shooting conditions are limited, high efficiency of image shooting and success rate of 3D reconstruction are required. To achieve the application of bionic compound eyes in small portable devices for 3D reconstruction, auto-navigation, and obstacle avoidance, a deep learning method of 3D reconstruction using a bionic compound-eye system with partial-overlap fields was studied. We used the system to capture images of the target scene, then restored the camera parameter matrix by solving the PnP problem. Considering the unique characteristics of the system, we designed a neural network based on the MVSNet network structure, named CES-MVSNet. We fed the captured image and camera parameters to the trained deep neural network, which can generate 3D reconstruction results with good integrity and precision. We used the traditional multi-view geometric method and neural networks for 3D reconstruction, and the difference between the effects of the two methods was analyzed. The efficiency and reliability of using the bionic compound-eye system for 3D reconstruction are proved. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

14 pages, 2486 KiB  
Article
Effective Attention-Based Mechanism for Masked Face Recognition
by Vandet Pann and Hyo Jong Lee
Appl. Sci. 2022, 12(11), 5590; https://doi.org/10.3390/app12115590 - 31 May 2022
Cited by 5 | Viewed by 2576
Abstract
Research on facial recognition has recently been flourishing, which has led to the introduction of many robust methods. However, since the worldwide outbreak of COVID-19, people have had to regularly wear facial masks, thus making existing face recognition methods less reliable. Although normal [...] Read more.
Research on facial recognition has recently been flourishing, which has led to the introduction of many robust methods. However, since the worldwide outbreak of COVID-19, people have had to regularly wear facial masks, thus making existing face recognition methods less reliable. Although normal face recognition methods are nearly complete, masked face recognition (MFR)—which refers to recognizing the identity of an individual when people wear a facial mask—remains the most challenging topic in this area. To overcome the difficulties involved in MFR, a novel deep learning method based on the convolutional block attention module (CBAM) and angular margin ArcFace loss is proposed. In the method, CBAM is integrated with convolutional neural networks (CNNs) to extract the input image feature maps, particularly of the region around the eyes. Meanwhile, ArcFace is used as a training loss function to optimize the feature embedding and enhance the discriminative feature for MFR. Because of the insufficient availability of masked face images for model training, this study used the data augmentation method to generate masked face images from a common face recognition dataset. The proposed method was evaluated using the well-known masked image version of LFW, AgeDB-30, CFP-FP, and real mask image MFR2 verification datasets. A variety of experiments confirmed that the proposed method offers improvements for MFR compared to the current state-of-the-art methods. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

15 pages, 4115 KiB  
Essay
YOLOv5-Ytiny: A Miniature Aggregate Detection and Classification Model
by Sheng Yuan, Yuying Du, Mingtang Liu, Shuang Yue, Bin Li and Hao Zhang
Electronics 2022, 11(11), 1743; https://doi.org/10.3390/electronics11111743 - 30 May 2022
Cited by 9 | Viewed by 2971
Abstract
Aggregate classification is the prerequisite for making concrete. Traditional aggregate identification methods have the disadvantages of low accuracy and a slow speed. To solve these problems, a miniature aggregate detection and classification model, based on the improved You Only Look Once (YOLO) algorithm, [...] Read more.
Aggregate classification is the prerequisite for making concrete. Traditional aggregate identification methods have the disadvantages of low accuracy and a slow speed. To solve these problems, a miniature aggregate detection and classification model, based on the improved You Only Look Once (YOLO) algorithm, named YOLOv5-ytiny is proposed in this study. Firstly, the C3 structure in YOLOv5 is replaced with our proposed CI structure. Then, the redundant part of the Neck structure is pruned by us. Finally, the bounding box regression loss function GIoU is changed to the CIoU function. The proposed YOLOv5-ytiny model was compared with other object detection algorithms such as YOLOv4, YOLOv4-tiny, and SSD. The experimental results demonstrate that the YOLOv5-ytiny model reaches 9.17 FPS, 60% higher than the original YOLOv5 algorithm, and reaches 99.6% mAP (the mean average precision). Moreover, the YOLOv5-ytiny model has significant speed advantages over CPU-only computer devices. This method can not only accurately identify the aggregate but can also obtain the relative position of the aggregate, which can be effectively used for aggregate detection. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

16 pages, 3724 KiB  
Article
Image-Based Automatic Individual Identification of Fish without Obvious Patterns on the Body (Scale Pattern)
by Dinara Bekkozhayeva and Petr Cisar
Appl. Sci. 2022, 12(11), 5401; https://doi.org/10.3390/app12115401 - 26 May 2022
Cited by 4 | Viewed by 4577
Abstract
The precision fish farming concept has been widely investigated in research and is highly desirable in aquaculture as it creates opportunities for precisely controlling and monitoring fish cultivation processes and increasing fish welfare. The automatic identification of individual fish could be one of [...] Read more.
The precision fish farming concept has been widely investigated in research and is highly desirable in aquaculture as it creates opportunities for precisely controlling and monitoring fish cultivation processes and increasing fish welfare. The automatic identification of individual fish could be one of the keys to enabling individual fish treatment. In a previous study, we already demonstrated that the visible patterns on a fish’s body can be used for the non-invasive individual identification of fishes from the same species (with obvious skin patterns, such as salmonids) over long-term periods. The aim of this study was to verify the possibility of using fully-automatic non-invasive photo-identification of individual fish based on natural marks on the fish’s body without any obvious skin patterns. This approach is an alternative to stressful invasive tagging and marking techniques. Scale patterns on the body and operculum, as well as lateral line shapes, were used as discriminative features for the identification of individuals in a closed group of fish. We used two fish species: the European seabass Dicentrarchus labrax and the common carp Cyprinus carpio. The identification method was tested on four experimental data sets for each fish species: two separate short-term data sets (pattern variability test) and two long-term data sets (pattern stability test) for European seabass (300 individual fish) and common carp (32 individual fish). The accuracy of classification was 100% for both fish species in both the short-term and long-term experiments. According to these results, the methods used for automatic non-invasive image-based individual-fish identification can also be used for fish species without obvious skin patterns. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Back to TopTop