Topic Editors

Department of Engineering (DING), University of Sannio, Benevento, Italy

Computer Vision and Image Processing

Abstract submission deadline
closed (31 March 2023)
Manuscript submission deadline
closed (30 June 2023)
Viewed by
113350

Topic Information

Dear Colleagues,

Computer vision is a scientific discipline that aims at developing models for understanding our 3D environment using cameras. Further, image processing can be understood as the whole body of techniques that extract useful information directly from images or to process them for optimal subsequent analysis. At any rate, computer vision and image processing are two closely related fields which can be considered as a work area used in almost any research involving cameras or any image sensor to acquire information from the scenes or working environments. Thus, the main aim of this Topic is to cover some of the relevant areas where computer vision/image processing is applied, including but not limited to:

  • Three-dimensional image acquisition, processing, and visualization
  • Scene understanding
  • Greyscale, color, and multispectral image processing
  • Multimodal sensor fusion
  • Industrial inspection
  • Robotics
  • Surveillance
  • Airborne and satellite on-board image acquisition platforms.
  • Computational models of vision
  • Imaging psychophysics
  • Etc.

Prof. Dr. Silvia Liberata Ullo
Topic Editor

Keywords

  • 3D acquisition, processing, and visualization
  • scene understanding
  • multimodal sensor processing and fusion
  • multispectral, color, and greyscale image processing
  • industrial quality inspection
  • computer vision for robotics
  • computer vision for surveillance
  • airborne and satellite on-board image acquisition platforms
  • computational models of vision
  • imaging psychophysics

Participating Journals

Journal Name Impact Factor CiteScore Launched Year First Decision (median) APC
Applied Sciences
applsci
2.7 4.5 2011 15.8 Days CHF 2300
Electronics
electronics
2.9 4.7 2012 15.8 Days CHF 2200
Modelling
modelling
- - 2020 17.9 Days CHF 1000
Journal of Imaging
jimaging
3.2 4.4 2015 21.9 Days CHF 1600

Preprints is a platform dedicated to making early versions of research outputs permanently available and citable. MDPI journals allow posting on preprint servers such as Preprints.org prior to publication. For more details about reprints, please visit https://www.preprints.org.

Published Papers (99 papers)

Order results
Result details
Journals
Select all
Export citation of selected articles as:
Article
Hypergraph Learning-Based Semi-Supervised Multi-View Spectral Clustering
Electronics 2023, 12(19), 4083; https://doi.org/10.3390/electronics12194083 - 29 Sep 2023
Viewed by 130
Abstract
Graph-based semi-supervised multi-view clustering has demonstrated promising performance and gained significant attention due to its capability to handle sample spaces with arbitrary shapes. Nevertheless, the ordinary graph employed by most existing semi-supervised multi-view clustering methods only captures the pairwise relationships between samples, and [...] Read more.
Graph-based semi-supervised multi-view clustering has demonstrated promising performance and gained significant attention due to its capability to handle sample spaces with arbitrary shapes. Nevertheless, the ordinary graph employed by most existing semi-supervised multi-view clustering methods only captures the pairwise relationships between samples, and cannot fully explore the higher-order information and complex structure among multiple sample points. Additionally, most existing methods do not make full use of the complementary information and spatial structure contained in multi-view data, which is crucial to clustering results. We propose a novel hypergraph learning-based semi-supervised multi-view spectral clustering approach to overcome these limitations. Specifically, the proposed method fully considers the relationship between multiple sample points and utilizes hypergraph-induced hyper-Laplacian matrices to preserve the high-order geometrical structure in data. Based on the principle of complementarity and consistency between views, this method simultaneously learns indicator matrices of all views and harnesses the tensor Schatten p-norm to extract both complementary information and low-rank spatial structure within these views. Furthermore, we introduce an auto-weighted strategy to address the discrepancy between singular values, enhancing the robustness and stability of the algorithm. Detailed experimental results on various datasets demonstrate that our approach surpasses existing state-of-the-art semi-supervised multi-view clustering methods. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Article
Creating Digital Watermarks in Bitmap Images Using Lagrange Interpolation and Bezier Curves
J. Imaging 2023, 9(10), 206; https://doi.org/10.3390/jimaging9100206 - 29 Sep 2023
Viewed by 151
Abstract
The article is devoted to the introduction of digital watermarks, which formthe basis for copyright protection systems. Methods in this area are aimed at embedding hidden markers that are resistant to various container transformations. This paper proposes a method for embedding a digital [...] Read more.
The article is devoted to the introduction of digital watermarks, which formthe basis for copyright protection systems. Methods in this area are aimed at embedding hidden markers that are resistant to various container transformations. This paper proposes a method for embedding a digital watermark into bitmap images using Lagrange interpolation and the Bezier curve formula for five points, called Lagrange interpolation along the Bezier curve 5 (LIBC5). As a means of steganalysis, the RS method was used, which uses a sensitive method of double statistics obtained on the basis of spatial correlations in images. The output value of the RS analysis is the estimated length of the message in the image under study. The stability of the developed LIBC5 method to the detection of message transmission by the RS method has been experimentally determined. The developed method proved to be resistant to RS analysis. A study of the LIBC5 method showed an improvement in quilting resistance compared to that of the INMI image embedding method, which also uses Lagrange interpolation. Thus, the LIBC5 stegosystem can be successfully used to protect confidential data and copyrights. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Noise-Robust Pulse Wave Estimation from Near-Infrared Face Video Images Using the Wiener Estimation Method
J. Imaging 2023, 9(10), 202; https://doi.org/10.3390/jimaging9100202 - 28 Sep 2023
Viewed by 249
Abstract
In this paper, we propose a noise-robust pulse wave estimation method from near-infrared face video images. Pulse wave estimation in a near-infrared environment is expected to be applied to non-contact monitoring in dark areas. The conventional method cannot consider noise when performing estimation. [...] Read more.
In this paper, we propose a noise-robust pulse wave estimation method from near-infrared face video images. Pulse wave estimation in a near-infrared environment is expected to be applied to non-contact monitoring in dark areas. The conventional method cannot consider noise when performing estimation. As a result, the accuracy of pulse wave estimation in noisy environments is not very high. This may adversely affect the accuracy of heart rate data and other data obtained from pulse wave signals. Therefore, the objective of this study is to perform pulse wave estimation robust to noise. The Wiener estimation method, which is a simple linear computation that can consider noise, was used in this study. Experimental results showed that the combination of the proposed method and signal processing (detrending and bandpass filtering) increased the SNR (signal to noise ratio) by more than 2.5 dB compared to the conventional method and signal processing. The correlation coefficient between the pulse wave signal measured using a pulse wave meter and the estimated pulse wave signal was 0.30 larger on average for the proposed method. Furthermore, the AER (absolute error rate) between the heart rate measured with the pulse wave meter was 0.82% on average for the proposed method, which was lower than the value of the conventional method (12.53% on average). These results show that the proposed method is more robust to noise than the conventional method for pulse wave estimation. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Automatic Jordanian License Plate Detection and Recognition System Using Deep Learning Techniques
J. Imaging 2023, 9(10), 201; https://doi.org/10.3390/jimaging9100201 - 28 Sep 2023
Viewed by 283
Abstract
Recently, the number of vehicles on the road, especially in urban centres, has increased dramatically due to the increasing trend of individuals towards urbanisation. As a result, manual detection and recognition of vehicles (i.e., license plates and vehicle manufacturers) become an arduous task [...] Read more.
Recently, the number of vehicles on the road, especially in urban centres, has increased dramatically due to the increasing trend of individuals towards urbanisation. As a result, manual detection and recognition of vehicles (i.e., license plates and vehicle manufacturers) become an arduous task and beyond human capabilities. In this paper, we have developed a system using transfer learning-based deep learning (DL) techniques to identify Jordanian vehicles automatically. The YOLOv3 (You Only Look Once) model was re-trained using transfer learning to accomplish license plate detection, character recognition, and vehicle logo detection. In contrast, the VGG16 (Visual Geometry Group) model was re-trained to accomplish the vehicle logo recognition. To train and test these models, four datasets have been collected. The first dataset consists of 7035 Jordanian vehicle images, the second dataset consists of 7176 Jordanian license plates, and the third dataset consists of 8271 Jordanian vehicle images. These datasets have been used to train and test the YOLOv3 model for Jordanian license plate detection, character recognition, and vehicle logo detection. In comparison, the fourth dataset consists of 158,230 vehicle logo images used to train and test the VGG16 model for vehicle logo recognition. Text measures were used to evaluate the performance of our developed system. Moreover, the mean average precision (mAP) measure was used to assess the YOLOv3 model of the detection tasks (i.e., license plate detection and vehicle logo detection). For license plate detection, the precision, recall, F-measure, and mAP were 99.6%, 100%, 99.8%, and 99.9%, respectively. While for character recognition, the precision, recall, and F-measure were 100%, 99.9%, and 99.95%, respectively. The performance of the license plate recognition stage was evaluated by evaluating these two sub-stages as a sequence, where the precision, recall, and F-measure were 99.8%, 99.8%, and 99.8%, respectively. Furthermore, for vehicle logo detection, the precision, recall, F-measure, and mAP were 99%, 99.6%, 99.3%, and 99.1%, respectively, while for vehicle logo recognition, the precision, recall, and F-measure were 98%, 98%, and 98%, respectively. The performance of the vehicle logo recognition stage was evaluated by evaluating these two sub-stages as a sequence, where the precision, recall, and F-measure were 95.3%, 99.5%, and 97.4%, respectively. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Few-Shot Object Detection with Local Feature Enhancement and Feature Interrelation
Electronics 2023, 12(19), 4036; https://doi.org/10.3390/electronics12194036 - 25 Sep 2023
Viewed by 244
Abstract
Few-shot object detection (FSOD) aims at designing models that can accurately detect targets of novel classes in a scarce data regime. Existing research has improved detection performance with meta-learning-based models. However, existing methods continue to exhibit certain imperfections: (1) Only the interacting global [...] Read more.
Few-shot object detection (FSOD) aims at designing models that can accurately detect targets of novel classes in a scarce data regime. Existing research has improved detection performance with meta-learning-based models. However, existing methods continue to exhibit certain imperfections: (1) Only the interacting global features of query and support images lead to ignoring local critical features in the imprecise localization of objects from new categories. (2) Convolutional neural networks (CNNs) encounter difficulty in learning diverse pose features from exceedingly limited labeled samples of unseen classes. (3) Local context information is not fully utilized in a global attention mechanism, which means the attention modules need to be improved. As a result, the detection performance of novel-class objects is compromised. To overcome these challenges, a few-shot object detection network is proposed with a local feature enhancement module and an intrinsic feature transformation module. In this paper, a local feature enhancement module (LFEM) is designed to raise the importance of intrinsic features of the novel-class samples. In addition, an Intrinsic Feature Transform Module (IFTM) is explored to enhance the feature representation of novel-class samples, which enriches the feature space of novel classes. Finally, a more effective cross-attention module, called Global Cross-Attention Network (GCAN), which fully aggregates local and global context information between query and support images, is proposed in this paper. The crucial features of novel-class objects are extracted effectively by our model before the feature fusion between query images and support images. Our proposed method increases, on average, the detection performance by 0.93 (nAP) in comparison with previous models on the PASCAL VOC FSOD benchmark dataset. Extensive experiments demonstrate the effectiveness of our modules under various experimental settings. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Attention-Mechanism-Based Models for Unconstrained Face Recognition with Mask Occlusion
Electronics 2023, 12(18), 3916; https://doi.org/10.3390/electronics12183916 - 17 Sep 2023
Viewed by 276
Abstract
Masks cover most areas of the face, resulting in a serious loss of facial identity information; thus, how to alleviate or eliminate the negative impact of occlusion is a significant problem in the field of unconstrained face recognition. Inspired by the successful application [...] Read more.
Masks cover most areas of the face, resulting in a serious loss of facial identity information; thus, how to alleviate or eliminate the negative impact of occlusion is a significant problem in the field of unconstrained face recognition. Inspired by the successful application of attention mechanisms and capsule networks in computer vision, we propose ECA-Inception-Resnet-Caps, which is a novel framework based on Inception-Resnet-v1 for learning discriminative face features in unconstrained mask-wearing conditions. Firstly, Squeeze-and-Excitation (SE) modules and Efficient Channel Attention (ECA) modules are applied to Inception-Resnet-v1 to increase the attention on unoccluded face areas, which is used to eliminate the negative impact of occlusion during feature extraction. Secondly, the effects of the two attention mechanisms on the different modules in Inception-Resnet-v1 are compared and analyzed, which is the foundation for further constructing the ECA-Inception-Resnet-Caps framework. Finally, ECA-Inception-Resnet-Caps is obtained by improving Inception-Resnet-v1 with capsule modules, which is explored to increase the interpretability and generalization of the model after reducing the negative impact of occlusion. The experimental results demonstrate that both attention mechanisms and the capsule network can effectively enhance the performance of Inception-Resnet-v1 for face recognition in occlusion tasks, with the ECA-Inception-Resnet-Caps model being the most effective, achieving an accuracy of 94.32%, which is 1.42% better than the baseline model. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Efficient Dehazing with Recursive Gated Convolution in U-Net: A Novel Approach for Image Dehazing
J. Imaging 2023, 9(9), 183; https://doi.org/10.3390/jimaging9090183 - 11 Sep 2023
Viewed by 365
Abstract
Image dehazing, a fundamental problem in computer vision, involves the recovery of clear visual cues from images marred by haze. Over recent years, deploying deep learning paradigms has spurred significant strides in image dehazing tasks. However, many dehazing networks aim to enhance performance [...] Read more.
Image dehazing, a fundamental problem in computer vision, involves the recovery of clear visual cues from images marred by haze. Over recent years, deploying deep learning paradigms has spurred significant strides in image dehazing tasks. However, many dehazing networks aim to enhance performance by adopting intricate network architectures, complicating training, inference, and deployment procedures. This study proposes an end-to-end U-Net dehazing network model with recursive gated convolution and attention mechanisms to improve performance while maintaining a lean network structure. In our approach, we leverage an improved recursive gated convolution mechanism to substitute the original U-Net’s convolution blocks with residual blocks and apply the SK fusion module to revamp the skip connection method. We designate this novel U-Net variant as the Dehaze Recursive Gated U-Net (DRGNet). Comprehensive testing across public datasets demonstrates the DRGNet’s superior performance in dehazing quality, detail retrieval, and objective evaluation metrics. Ablation studies further confirm the effectiveness of the key design elements. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Feature Point Identification in Fillet Weld Joints Using an Improved CPDA Method
Appl. Sci. 2023, 13(18), 10108; https://doi.org/10.3390/app131810108 - 07 Sep 2023
Viewed by 202
Abstract
An intelligent, vision-guided welding robot is highly desired in machinery manufacturing, the ship industry, and vehicle engineering. The performance of the system greatly depends on the effective identification of weld seam features and the three-dimensional (3D) reconstruction of the weld seam position in [...] Read more.
An intelligent, vision-guided welding robot is highly desired in machinery manufacturing, the ship industry, and vehicle engineering. The performance of the system greatly depends on the effective identification of weld seam features and the three-dimensional (3D) reconstruction of the weld seam position in a complex industrial environment. In this paper, a 3D visual sensing system with a structured laser projector and CCD camera is developed to obtain the geometry information of fillet weld seams in robot welding. By accounting for the inclination characteristics of the laser stripe in fillet welding, a Gaussian-weighted PCA-based laser center line extraction method is proposed. Smoother laser centerlines can be obtained at large, inclined angles. Furthermore, an improved chord-to-point distance accumulation (CPDA) method with polygon approximation is proposed to identify the feature corner location in center line images. The proposed method is validated numerically with simulated piece-wise linear laser stripes and experimentally with automated robot welding. By comparing this method with the grayscale gravity method, Hessian-matrix-based method, and conventional CPDA method, the proposed improved CPDA method with PCA center extraction is shown to have high accuracy and robustness in noisy welding environments. The proposed method meets the need for vision-aided automated welding robots by achieving greater than 95% accuracy in corner feature point identification in fillet welding. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Review
Research Progress on the Aesthetic Quality Assessment of Complex Layout Images Based on Deep Learning
Appl. Sci. 2023, 13(17), 9763; https://doi.org/10.3390/app13179763 - 29 Aug 2023
Viewed by 416
Abstract
With the development of the information age, the layout image is no longer a simple combination of text and graphics, but covers the complex layout image obtained from text, graphics, images and other layout elements through the process of artistic design, pre-press processing, [...] Read more.
With the development of the information age, the layout image is no longer a simple combination of text and graphics, but covers the complex layout image obtained from text, graphics, images and other layout elements through the process of artistic design, pre-press processing, typesetting, and so on. At present, the field of aesthetic-quality assessment mainly focuses on photographic images, and the aesthetic-quality assessment of complex layout images is rarely reported. However, the design of complex layout images such as posters, packaging labels, advertisements, etc., cannot be separated from the evaluation of aesthetic quality. In this paper, layout analysis is performed on complex layout images. Traditional and deep-learning-based methods for image layout analysis and aesthetic-quality assessment are reviewed and analyzed. Finally, the features, advantages and applications of common image aesthetic-quality assessment datasets and layout analysis datasets are compared and analyzed. Limitations and future perspectives of aesthetic assessment of complex layout images are discussed in relation to layout analysis and aesthetic characteristics. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
PP-JPEG: A Privacy-Preserving JPEG Image-Tampering Localization
J. Imaging 2023, 9(9), 172; https://doi.org/10.3390/jimaging9090172 - 27 Aug 2023
Viewed by 439
Abstract
The widespread availability of digital image-processing software has given rise to various forms of image manipulation and forgery, which can pose a significant challenge in different fields, such as law enforcement, journalism, etc. It can also lead to privacy concerns. We are proposing [...] Read more.
The widespread availability of digital image-processing software has given rise to various forms of image manipulation and forgery, which can pose a significant challenge in different fields, such as law enforcement, journalism, etc. It can also lead to privacy concerns. We are proposing that a privacy-preserving framework to encrypt images before processing them is vital to maintain the privacy and confidentiality of sensitive images, especially those used for the purpose of investigation. To address these challenges, we propose a novel solution that detects image forgeries while preserving the privacy of the images. Our method proposes a privacy-preserving framework that encrypts the images before processing them, making it difficult for unauthorized individuals to access them. The proposed method utilizes a compression quality analysis in the encrypted domain to detect the presence of forgeries in images by determining if the forged portion (dummy image) has a compression quality different from that of the original image (featured image) in the encrypted domain. This approach effectively localizes the tampered portions of the image, even for small pixel blocks of size 10×10 in the encrypted domain. Furthermore, the method identifies the featured image’s JPEG quality using the first minima in the energy graph. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Self-Supervised Sound Promotion Method of Sound Localization from Video
Electronics 2023, 12(17), 3558; https://doi.org/10.3390/electronics12173558 - 23 Aug 2023
Viewed by 331
Abstract
Compared to traditional unimodal methods, multimodal audio-visual correspondence learning has many advantages in the field of video understanding, but it also faces significant challenges. In order to fully utilize the feature information from both modalities, we needs to ensure accurate alignment of the [...] Read more.
Compared to traditional unimodal methods, multimodal audio-visual correspondence learning has many advantages in the field of video understanding, but it also faces significant challenges. In order to fully utilize the feature information from both modalities, we needs to ensure accurate alignment of the semantic information from each modality, rather than simply concatenating them together. This requires consideration of how to design fusion networks that can better perform this task. Current algorithms heavily rely on the network’s output results for sound-object localization while neglecting the potential issue of suppressed feature information due to the internal structure of the network. Thus, we propose a sound promotion method (SPM), a self-supervised framework that aims to increase the contribution of voices to produce better performance of the audiovisual learning. We first cluster the audio separately to generate pseudo-labels and then use the clusters to train the backbone of audio. Finally, we explore the impact of our method to several existing approaches on MUSIC datasets and the results prove that our proposed method is able to produce better performance. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Thangka Image Captioning Based on Semantic Concept Prompt and Multimodal Feature Optimization
J. Imaging 2023, 9(8), 162; https://doi.org/10.3390/jimaging9080162 - 16 Aug 2023
Viewed by 502
Abstract
Thangka images exhibit a high level of diversity and richness, and the existing deep learning-based image captioning methods generate poor accuracy and richness of Chinese captions for Thangka images. To address this issue, this paper proposes a Semantic Concept Prompt and Multimodal Feature [...] Read more.
Thangka images exhibit a high level of diversity and richness, and the existing deep learning-based image captioning methods generate poor accuracy and richness of Chinese captions for Thangka images. To address this issue, this paper proposes a Semantic Concept Prompt and Multimodal Feature Optimization network (SCAMF-Net). The Semantic Concept Prompt (SCP) module is introduced in the text encoding stage to obtain more semantic information about the Thangka by introducing contextual prompts, thus enhancing the richness of the description content. The Multimodal Feature Optimization (MFO) module is proposed to optimize the correlation between Thangka images and text. This module enhances the correlation between the image features and text features of the Thangka through the Captioner and Filter to more accurately describe the visual concept features of the Thangka. The experimental results demonstrate that our proposed method outperforms baseline models on the Thangka dataset in terms of BLEU-4, METEOR, ROUGE, CIDEr, and SPICE by 8.7%, 7.9%, 8.2%, 76.6%, and 5.7%, respectively. Furthermore, this method also exhibits superior performance compared to the state-of-the-art methods on the public MSCOCO dataset. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Eff-PCNet: An Efficient Pure CNN Network for Medical Image Classification
Appl. Sci. 2023, 13(16), 9226; https://doi.org/10.3390/app13169226 - 14 Aug 2023
Viewed by 424
Abstract
With the development of deep learning, convolutional neural networks (CNNs) and Transformer-based methods have become key techniques for medical image classification tasks. However, many current neural network models have problems such as high complexity, a large number of parameters, and large model sizes; [...] Read more.
With the development of deep learning, convolutional neural networks (CNNs) and Transformer-based methods have become key techniques for medical image classification tasks. However, many current neural network models have problems such as high complexity, a large number of parameters, and large model sizes; such models obtain higher classification accuracy at the expense of lightweight networks. Moreover, such larger-scale models pose a great challenge for practical clinical applications. Meanwhile, Transformer and multi-layer perceptron (MLP) methods have some shortcomings in terms of local modeling capability and high model complexity, and need to be used on larger datasets to show good performance. This makes it difficult to utilize these networks in clinical medicine. Based on this, we propose a lightweight and efficient pure CNN network for medical image classification (Eff-PCNet). On the one hand, we propose a multi-branch multi-scale CNN (M2C) module, which divides the feature map into four parallel branches along the channel dimensions by a certain scale factor and carries out a deep convolution operation using different scale convolution kernels, and this multi-branch multi-scale operation effectively replaces the large kernel convolution. This multi-branch multi-scale operation effectively replaces the large kernel convolution. It reduces the computational cost of the module while fusing the feature information between different channels and thus obtains richer feature information. Finally, the four feature maps are then spliced along the channel dimensions to fuse the multi-scale and multi-dimensional feature information. On the other hand, we introduce the structural reparameterization technique and propose the structural reparameterized CNN (Rep-C) module. Specifically, it utilizes multiple linear operators to generate different feature maps during the training process and fuses all the participants into one through parameter fusion to achieve fast inference while providing a more effective solution for feature reuse. A number of experimental results show that our Eff-PCNet performs better than current methods based on CNN, Transformer, and MLP in the classification of three publicly available medical image datasets. Among them, we achieve 87.4% Acc on the HAM10000 dataset, 91.06% Acc on the SkinCancer dataset, and 97.03% Acc on the Chest-Xray dataset. Meanwhile, our approach achieves a better trade-off between the number of parameters; computation; and other performance metrics as well. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Improved Traffic Small Object Detection via Cross-Layer Feature Fusion and Channel Attention
Electronics 2023, 12(16), 3421; https://doi.org/10.3390/electronics12163421 - 12 Aug 2023
Viewed by 428
Abstract
Small object detection has long been one of the most formidable challenges in computer vision due to the poor visual features and high noise of surroundings behind them. However, small targets in traffic scenes encompass a multitude of complex visual interfering factors, bearing [...] Read more.
Small object detection has long been one of the most formidable challenges in computer vision due to the poor visual features and high noise of surroundings behind them. However, small targets in traffic scenes encompass a multitude of complex visual interfering factors, bearing crucial information such as traffic signs, traffic lights, and pedestrians. Given the inherent difficulties faced by generic models in addressing these issues, we conduct a comprehensive investigation on small target detection in this application scenario. In this work, we present a Cross-Layer Feature Fusion and Channel Attention algorithm based on a lightweight YOLOv5s design for traffic small target detection, named CFA-YOLO. To enhance the sensitivity of the model toward vital features, we embed the channel-guided Squeeze-and-Excitation (SE) block in the deep layer of the backbone. Moreover, the most excellent innovation of our work belongs to the effective cross-layer feature fusion method, which maintains robust feature fusion and information interaction capabilities; in addition, it simplifies redundant parameters compared with the baseline model. To align with the output features of the neck network, we adjusted the detection heads from three to two. Furthermore, we also applied the decoupled detection head for classification and bounding box regression tasks, respectively. This approach not only achieves real-time detection standards, but also improves the overall training results in parameter-friendly manner. The CFA-YOLO model significantly pays a lot of attention to the detail features of small targets, thereby it also has a great advantage in addressing the issue of poor performance in traffic small target detection results. Vast experiments have validated the efficiency and effectiveness of our proposed method in traffic small object detection. Compared with the latest lightweight detectors, such as YOLOv7-Tiny and YOLOv8s, our method consistently achieves superior performance both in terms of the model’s accuracy and complexity. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Vision-Based System for Black Rubber Roller Surface Inspection
Appl. Sci. 2023, 13(15), 8999; https://doi.org/10.3390/app13158999 - 06 Aug 2023
Viewed by 500
Abstract
This paper proposes a machine vision system for the surface inspection of black rubber rollers in manufacturing processes. The system aims to enhance the surface quality of the rollers by detecting and classifying defects. A lighting system is installed to highlight surface defects. [...] Read more.
This paper proposes a machine vision system for the surface inspection of black rubber rollers in manufacturing processes. The system aims to enhance the surface quality of the rollers by detecting and classifying defects. A lighting system is installed to highlight surface defects. Two algorithms are proposed for defect detection: a traditional-based method and a deep learning-based method. The former is fast but limited to surface defect detection, while the latter is slower but capable of detecting and classifying defects. The accuracy of the algorithms is verified through experiments, with the traditional-based method achieving near-perfect accuracy of approximately 98% for defect detection, and the deep learning-based method achieving an accuracy of approximately 95.2% for defect detection and 96% for defect classification. The proposed machine vision system can significantly improve the surface inspection of black rubber rollers, thereby ensuring high-quality production. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Image Sampling Based on Dominant Color Component for Computer Vision
Electronics 2023, 12(15), 3360; https://doi.org/10.3390/electronics12153360 - 06 Aug 2023
Viewed by 451
Abstract
Image sampling is a fundamental technique for image compression, which greatly improves the efficiency of image storage, transmission, and applications. However, existing sampling algorithms primarily consider human visual perception and discard irrelevant information based on subjective preferences. Unfortunately, these methods may not adequately [...] Read more.
Image sampling is a fundamental technique for image compression, which greatly improves the efficiency of image storage, transmission, and applications. However, existing sampling algorithms primarily consider human visual perception and discard irrelevant information based on subjective preferences. Unfortunately, these methods may not adequately meet the demands of computer vision tasks and can even lead to redundancy because of the different preferences between human and computer. To tackle this issue, this paper investigates the key features of computer vision. Based on our findings, we propose an image sampling method based on the dominant color component (ISDCC). In this method, we utilize a grayscale image to preserve the essential structural information for computer vision. Then, we construct a concise color feature map based on the dominant channel of pixels. This approach provides relevant color information for computer vision tasks. We conducted experimental evaluations using well-known benchmark datasets. The results demonstrate that ISDCC adapts effectively to computer vision requirements, significantly reducing the amount of data needed. Furthermore, our method has a minimal impact on the performance of mainstream computer vision algorithms across various tasks. Compared to other sampling approaches, our proposed method exhibits clear advantages by achieving superior results with less data usage. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Review
Pseudo Labels for Unsupervised Domain Adaptation: A Review
Electronics 2023, 12(15), 3325; https://doi.org/10.3390/electronics12153325 - 03 Aug 2023
Viewed by 548
Abstract
Conventional machine learning relies on two presumptions: (1) the training and testing datasets follow the same independent distribution, and (2) an adequate quantity of samples is essential for achieving optimal model performance during training. Nevertheless, meeting these two assumptions can be challenging in [...] Read more.
Conventional machine learning relies on two presumptions: (1) the training and testing datasets follow the same independent distribution, and (2) an adequate quantity of samples is essential for achieving optimal model performance during training. Nevertheless, meeting these two assumptions can be challenging in real-world scenarios. Domain adaptation (DA) is a subfield of transfer learning that focuses on reducing the distribution difference between the source domain (Ds) and target domain (Dt) and subsequently applying the knowledge gained from the Ds task to the Dt task. The majority of current DA methods aim to achieve domain invariance by aligning the marginal probability distributions of the Ds. and Dt. Recent studies have pointed out that aligning marginal probability distributions alone is not sufficient and that alignment of conditional probability distributions is equally important for knowledge migration. Nonetheless, unsupervised DA presents a more significant difficulty in aligning the conditional probability distributions because of the unavailability of labels for the Dt. In response to this issue, there have been several proposed methods by researchers, including pseudo-labeling, which offer novel solutions to tackle the problem. In this paper, we systematically analyze various pseudo-labeling algorithms and their applications in unsupervised DA. First , we summarize the pseudo-label generation methods based on the single and multiple classifiers and actions taken to deal with the problem of imbalanced samples. Second, we investigate the application of pseudo-labeling in category feature alignment and improving feature discrimination. Finally, we point out the challenges and trends of pseudo-labeling algorithms. As far as we know, this article is the initial review of pseudo-labeling techniques for unsupervised DA. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
A Comparison of Monocular Visual SLAM and Visual Odometry Methods Applied to 3D Reconstruction
Appl. Sci. 2023, 13(15), 8837; https://doi.org/10.3390/app13158837 - 31 Jul 2023
Viewed by 836
Abstract
Pure monocular 3D reconstruction is a complex problem that has attracted the research community’s interest due to the affordability and availability of RGB sensors. SLAM, VO, and SFM are disciplines formulated to solve the 3D reconstruction problem and estimate the camera’s ego-motion; so, [...] Read more.
Pure monocular 3D reconstruction is a complex problem that has attracted the research community’s interest due to the affordability and availability of RGB sensors. SLAM, VO, and SFM are disciplines formulated to solve the 3D reconstruction problem and estimate the camera’s ego-motion; so, many methods have been proposed. However, most of these methods have not been evaluated on large datasets and under various motion patterns, have not been tested under the same metrics, and most of them have not been evaluated following a taxonomy, making their comparison and selection difficult. In this research, we performed a comparison of ten publicly available SLAM and VO methods following a taxonomy, including one method for each category of the primary taxonomy, three machine-learning-based methods, and two updates of the best methods to identify the advantages and limitations of each category of the taxonomy and test whether the addition of machine learning or updates on those methods improved them significantly. Thus, we evaluated each algorithm using the TUM-Mono dataset and benchmark, and we performed an inferential statistical analysis to identify the significant differences through its metrics. The results determined that the sparse-direct methods significantly outperformed the rest of the taxonomy, and fusing them with machine learning techniques significantly enhanced the geometric-based methods’ performance from different perspectives. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Risevi: A Disease Risk Prediction Model Based on Vision Transformer Applied to Nursing Homes
Electronics 2023, 12(15), 3206; https://doi.org/10.3390/electronics12153206 - 25 Jul 2023
Viewed by 407
Abstract
The intensification of population aging has brought pressure on public medical care. In order to reduce this pressure, we combined the image classification method with computer vision and used audio data that is easy to collect in nursing homes. Based on MelGAN, transfer [...] Read more.
The intensification of population aging has brought pressure on public medical care. In order to reduce this pressure, we combined the image classification method with computer vision and used audio data that is easy to collect in nursing homes. Based on MelGAN, transfer learning, and Vision Transformer, we propose an application called Risevi (A Disease Risk Prediction Model Based on Vision Transformer), a disease risk prediction model for nursing homes. We first design a sample generation method based on MelGAN, then refer to the Mel frequency cepstral coefficient and the Wav2vec2 model to design the sample feature extraction method, perform floating-point operations on the tensor of the extracted features, and then convert it into a waveform. We then design a sample feature classification method based on transfer learning and Vision Transformer. Finally, we obtain the Risevi model. In this paper, we use public datasets and subject data as sample data. The experimental results show that the Risevi model has achieved an accuracy rate of 98.5%, a precision rate of 96.38%, a recall rate of 98.17%, and an F1 score of 97.15%. The experimental results show that the Risevi model can provide practical support for reducing public medical pressure. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
A Decoupled Semantic–Detail Learning Network for Remote Sensing Object Detection in Complex Backgrounds
Electronics 2023, 12(14), 3201; https://doi.org/10.3390/electronics12143201 - 24 Jul 2023
Viewed by 475
Abstract
Detecting multi-scale objects in complex backgrounds is a crucial challenge in remote sensing. The main challenge is that the localization and identification of objects in complex backgrounds can be inaccurate. To address this issue, a decoupled semantic–detail learning network (DSDL-Net) was proposed. Our [...] Read more.
Detecting multi-scale objects in complex backgrounds is a crucial challenge in remote sensing. The main challenge is that the localization and identification of objects in complex backgrounds can be inaccurate. To address this issue, a decoupled semantic–detail learning network (DSDL-Net) was proposed. Our proposed approach comprises two components. Firstly, we introduce a multi-receptive field feature fusion and detail mining (MRF-DM) module, which learns higher semantic-level representations by fusing multi-scale receptive fields. Subsequently, it uses multi-scale pooling to preserve detail texture information at different scales. Secondly, we present an adaptive cross-level semantic–detail fusion (CSDF) network that leverages a feature pyramid with fusion between detailed features extracted from the backbone network and high-level semantic features obtained from the topmost layer of the pyramid. The fusion is accomplished through two rounds of parallel global–local contextual feature extraction, with shared learning for global context information between the two rounds. Furthermore, to effectively enhance fine-grained texture features conducive to object localization and features conducive to object semantic recognition, we adopt and improve two enhancement modules with attention mechanisms, making them simpler and more lightweight. Our experimental results demonstrate that our approach outperforms 12 benchmark models on three publicly available remote sensing datasets (DIOR, HRRSD, and RSOD) regarding average precision (AP) at small, medium, and large scales. On the DIOR dataset, our model achieved a 2.19% improvement in mAP@0.5 compared to the baseline model, with a parameter reduction of 14.07%. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
FURSformer: Semantic Segmentation Network for Remote Sensing Images with Fused Heterogeneous Features
Electronics 2023, 12(14), 3113; https://doi.org/10.3390/electronics12143113 - 18 Jul 2023
Cited by 1 | Viewed by 355
Abstract
Semantic segmentation of remote sensing images poses a formidable challenge within this domain. Our investigation commences with a pilot study aimed at scrutinizing the advantages and disadvantages of employing a Transformer architecture and a CNN architecture in remote sensing imagery (RSI). Our objective [...] Read more.
Semantic segmentation of remote sensing images poses a formidable challenge within this domain. Our investigation commences with a pilot study aimed at scrutinizing the advantages and disadvantages of employing a Transformer architecture and a CNN architecture in remote sensing imagery (RSI). Our objective is to substantiate the indispensability of both local and global information for RSI analysis. In this research article, we harness the potential of the Transformer model to establish global contextual understanding while incorporating an additional convolution module for localized perception. Nonetheless, a direct fusion of these heterogeneous information sources often yields subpar outcomes. To address this limitation, we propose an innovative hierarchical fusion feature information module that this model can fuse Transformer and CNN features using an ensemble-to-set approach, thereby enhancing information compatibility. Our proposed model, named FURSformer, amalgamates the strengths of the Transformer architecture and CNN. The experimental results clearly demonstrate the effectiveness of this approach. Notably, our model achieved an outstanding accuracy of 90.78% mAccuracy on the DLRSD dataset. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
D-UAP: Initially Diversified Universal Adversarial Patch Generation Method
Electronics 2023, 12(14), 3080; https://doi.org/10.3390/electronics12143080 - 14 Jul 2023
Viewed by 413
Abstract
With the rapid development of adversarial example technologies, the concept of adversarial patches has been proposed, which can successfully transfer adversarial attacks to the real world and fool intelligent object detection systems. However, the real-world environment is complex and changeable, and the adversarial [...] Read more.
With the rapid development of adversarial example technologies, the concept of adversarial patches has been proposed, which can successfully transfer adversarial attacks to the real world and fool intelligent object detection systems. However, the real-world environment is complex and changeable, and the adversarial patch attack technology is susceptible to real-world factors, resulting in a decrease in the success rate of attack. Existing adversarial-patch-generation algorithms have a single direction of patch initialization and do not fully consider the impact of initial diversification on its upper limit of adversarial patch attack. Therefore, this paper proposes an initial diversified adversarial patch generation technology to improve the effect of adversarial patch attacks on the underlying algorithms in the real world. The method uses YOLOv4 as the attack model, and the experimental results show that the attack effect of the adversarial-patch-attack method proposed in this paper is higher than the baseline 8.46%, and it also has a stronger attack effect and fewer training rounds. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
A Super-Resolution Reconstruction Network of Space Target Images Based on Dual Regression and Deformable Convolutional Attention Mechanism
Electronics 2023, 12(13), 2995; https://doi.org/10.3390/electronics12132995 - 07 Jul 2023
Viewed by 329
Abstract
High-quality space target images are important for space surveillance and space attack defense confrontation. To obtain space target images with higher resolution and sharpness, this paper proposes an image super-resolution reconstruction network based on dual regression and a deformable convolutional attention mechanism (DCAM). [...] Read more.
High-quality space target images are important for space surveillance and space attack defense confrontation. To obtain space target images with higher resolution and sharpness, this paper proposes an image super-resolution reconstruction network based on dual regression and a deformable convolutional attention mechanism (DCAM). Firstly, the mapping space is constrained by dual regression; secondly, deformable convolution is used to expand the perceptual field and extract the high-frequency features of the image; finally, the convolutional attention mechanism is used to calculate the saliency of the channel domain and the spatial domain of the image to enhance the useful features and suppress the useless feature responses. The experimental results show that the method outperforms the comparison algorithm in both objective quality evaluation index and localization accuracy on the space target image dataset compared with the current mainstream image super-resolution algorithms. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Color Mura Defect Detection Method Based on Channel Contrast Sensitivity Function Filtering
Electronics 2023, 12(13), 2965; https://doi.org/10.3390/electronics12132965 - 05 Jul 2023
Viewed by 392
Abstract
To address the issue of low detection accuracy caused by low contrast in color Mura defects, this paper proposes a color Mura defect detection method based on channel contrast sensitivity function (CSF) filtering. The RGB image of the captured liquid crystal display (LCD) [...] Read more.
To address the issue of low detection accuracy caused by low contrast in color Mura defects, this paper proposes a color Mura defect detection method based on channel contrast sensitivity function (CSF) filtering. The RGB image of the captured liquid crystal display (LCD) display is converted to the Lab color space, and the Weber contrast feature maps of the Lab channel images are calculated. Frequency domain filtering is performed using the CSF to obtain visually sensitive Lab feature maps. Color Mura defect detection is achieved by employing adaptive segmentation thresholds based on the fused feature maps of the L channel and ab channels. The color Mura evaluation criterion is utilized to quantitatively assess the defect detection results. Experimental results demonstrate that the proposed method achieves an accuracy rate of 87% in color Mura defect detection, outperforming existing mainstream detection methods. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Ambiguity in Solving Imaging Inverse Problems with Deep-Learning-Based Operators
J. Imaging 2023, 9(7), 133; https://doi.org/10.3390/jimaging9070133 - 30 Jun 2023
Viewed by 700
Abstract
In recent years, large convolutional neural networks have been widely used as tools for image deblurring, because of their ability in restoring images very precisely. It is well known that image deblurring is mathematically modeled as an ill-posed inverse problem and its solution [...] Read more.
In recent years, large convolutional neural networks have been widely used as tools for image deblurring, because of their ability in restoring images very precisely. It is well known that image deblurring is mathematically modeled as an ill-posed inverse problem and its solution is difficult to approximate when noise affects the data. Really, one limitation of neural networks for deblurring is their sensitivity to noise and other perturbations, which can lead to instability and produce poor reconstructions. In addition, networks do not necessarily take into account the numerical formulation of the underlying imaging problem when trained end-to-end. In this paper, we propose some strategies to improve stability without losing too much accuracy to deblur images with deep-learning-based methods. First, we suggest a very small neural architecture, which reduces the execution time for training, satisfying a green AI need, and does not extremely amplify noise in the computed image. Second, we introduce a unified framework where a pre-processing step balances the lack of stability of the following neural-network-based step. Two different pre-processors are presented. The former implements a strong parameter-free denoiser, and the latter is a variational-model-based regularized formulation of the latent imaging problem. This framework is also formally characterized by mathematical analysis. Numerical experiments are performed to verify the accuracy and stability of the proposed approaches for image deblurring when unknown or not-quantified noise is present; the results confirm that they improve the network stability with respect to noise. In particular, the model-based framework represents the most reliable trade-off between visual precision and robustness. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
A Joint De-Rain and De-Mist Network Based on the Atmospheric Scattering Model
J. Imaging 2023, 9(7), 129; https://doi.org/10.3390/jimaging9070129 - 26 Jun 2023
Viewed by 551
Abstract
Rain can have a detrimental effect on optical components, leading to the appearance of streaks and halos in images captured during rainy conditions. These visual distortions caused by rain and mist contribute significant noise information that can compromise image quality. In this paper, [...] Read more.
Rain can have a detrimental effect on optical components, leading to the appearance of streaks and halos in images captured during rainy conditions. These visual distortions caused by rain and mist contribute significant noise information that can compromise image quality. In this paper, we propose a novel approach for simultaneously removing both streaks and halos from the image to produce clear results. First, based on the principle of atmospheric scattering, a rain and mist model is proposed to initially remove the streaks and halos from the image by reconstructing the image. The Deep Memory Block (DMB) selectively extracts the rain layer transfer spectrum and the mist layer transfer spectrum from the rainy image to separate these layers. Then, the Multi-scale Convolution Block (MCB) receives the reconstructed images and extracts both structural and detailed features to enhance the overall accuracy and robustness of the model. Ultimately, extensive results demonstrate that our proposed model JDDN (Joint De-rain and De-mist Network) outperforms current state-of-the-art deep learning methods on synthetic datasets as well as real-world datasets, with an average improvement of 0.29 dB on the heavy-rainy-image dataset. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Quantifying the Displacement of Data Matrix Code Modules: A Comparative Study of Different Approximation Approaches for Predictive Maintenance of Drop-on-Demand Printing Systems
J. Imaging 2023, 9(7), 125; https://doi.org/10.3390/jimaging9070125 - 21 Jun 2023
Viewed by 692
Abstract
Drop-on-demand printing using colloidal or pigmented inks is prone to the clogging of printing nozzles, which can lead to positional deviations and inconsistently printed patterns (e.g., data matrix codes, DMCs). However, if such deviations are detected early, they can be useful for determining [...] Read more.
Drop-on-demand printing using colloidal or pigmented inks is prone to the clogging of printing nozzles, which can lead to positional deviations and inconsistently printed patterns (e.g., data matrix codes, DMCs). However, if such deviations are detected early, they can be useful for determining the state of the print head and planning maintenance operations prior to reaching a printing state where the printed DMCs are unreadable. To realize this predictive maintenance approach, it is necessary to accurately quantify the positional deviation of individually printed dots from the actual target position. Here, we present a comparison of different methods based on affinity transformations and clustering algorithms for calculating the target position from the printed positions and, subsequently, the deviation of both for complete DMCs. Hence, our method focuses on the evaluation of the print quality, not on the decoding of DMCs. We compare our results to a state-of-the-art decoding algorithm, adopted to return the target grid positions, and find that we can determine the occurring deviations with significantly higher accuracy, especially when the printed DMCs are of low quality. The results enable the development of decision systems for predictive maintenance and subsequently the optimization of printing systems. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
A Novel Image Encryption Scheme Using Chaotic Maps and Fuzzy Numbers for Secure Transmission of Information
Appl. Sci. 2023, 13(12), 7113; https://doi.org/10.3390/app13127113 - 14 Jun 2023
Cited by 2 | Viewed by 1555
Abstract
The complexity of chaotic systems, if used in information encryption, can determine the status of security. The paper proposes a novel image encryption scheme that uses chaotic maps and fuzzy numbers for the secure transmission of information. The encryption method combines logistic and [...] Read more.
The complexity of chaotic systems, if used in information encryption, can determine the status of security. The paper proposes a novel image encryption scheme that uses chaotic maps and fuzzy numbers for the secure transmission of information. The encryption method combines logistic and sine maps to form the logistic sine map, as well as the fuzzy concept and the Hénon map to form the fuzzy Hénon map, in which these maps are used to generate secure secret keys, respectively. Additionally, a fuzzy triangular membership function is used to modify the initial conditions of the maps during the diffusion process. The encryption process involves scrambling the image pixels, summing adjacent row values, and XORing the result with randomly generated numbers from the chaotic maps. The proposed method is tested against various attacks, including statistical attack analysis, local entropy analysis, differential attack analysis, signal-to-noise ratio, signal-to-noise distortion ratio, mean error square, brute force attack analysis, and information entropy analysis, while the randomness number has been evaluated using the NIST test. This scheme also has a high key sensitivity, which means that a small change in the secret keys can result in a significant change in the encrypted image The results demonstrate the effectiveness of the proposed scheme in ensuring the secure transmission of information. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
SimoSet: A 3D Object Detection Dataset Collected from Vehicle Hybrid Solid-State LiDAR
Electronics 2023, 12(11), 2424; https://doi.org/10.3390/electronics12112424 - 26 May 2023
Viewed by 797
Abstract
Three-dimensional (3D) object detection based on point cloud data plays a critical role in the perception system of autonomous driving. However, this task presents a significant challenge in terms of its practical implementation due to the absence of point cloud data from automotive-grade [...] Read more.
Three-dimensional (3D) object detection based on point cloud data plays a critical role in the perception system of autonomous driving. However, this task presents a significant challenge in terms of its practical implementation due to the absence of point cloud data from automotive-grade hybrid solid-state LiDAR, as well as the limitations regarding the generalization ability of data-driven deep learning methods. In this paper, we introduce SimoSet, the first vehicle view 3D object detection dataset composed of automotive-grade hybrid solid-state LiDAR data. The dataset was collected from a university campus, contains 52 scenes, each of which are 8 s long, and provides three types of labels for typical traffic participants. We analyze the impact of the installation height and angle of the LiDAR on scanning effect and provide a reference process for the collection, annotation, and format conversion of LiDAR data. Finally, we provide baselines for LiDAR-only 3D object detection. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Combining CNNs and Markov-like Models for Facial Landmark Detection with Spatial Consistency Estimates
J. Imaging 2023, 9(5), 104; https://doi.org/10.3390/jimaging9050104 - 22 May 2023
Viewed by 960
Abstract
The accurate localization of facial landmarks is essential for several tasks, including face recognition, head pose estimation, facial region extraction, and emotion detection. Although the number of required landmarks is task-specific, models are typically trained on all available landmarks in the datasets, limiting [...] Read more.
The accurate localization of facial landmarks is essential for several tasks, including face recognition, head pose estimation, facial region extraction, and emotion detection. Although the number of required landmarks is task-specific, models are typically trained on all available landmarks in the datasets, limiting efficiency. Furthermore, model performance is strongly influenced by scale-dependent local appearance information around landmarks and the global shape information generated by them. To account for this, we propose a lightweight hybrid model for facial landmark detection designed specifically for pupil region extraction. Our design combines a convolutional neural network (CNN) with a Markov random field (MRF)-like process trained on only 17 carefully selected landmarks. The advantage of our model is the ability to run different image scales on the same convolutional layers, resulting in a significant reduction in model size. In addition, we employ an approximation of the MRF that is run on a subset of landmarks to validate the spatial consistency of the generated shape. This validation process is performed against a learned conditional distribution, expressing the location of one landmark relative to its neighbor. Experimental results on popular facial landmark localization datasets such as 300 w, WFLW, and HELEN demonstrate the accuracy of our proposed model. Furthermore, our model achieves state-of-the-art performance on a well-defined robustness metric. In conclusion, the results demonstrate the ability of our lightweight model to filter out spatially inconsistent predictions, even with significantly fewer training landmarks. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
A Hierarchical Clustering Obstacle Detection Method Applied to RGB-D Cameras
Electronics 2023, 12(10), 2316; https://doi.org/10.3390/electronics12102316 - 21 May 2023
Cited by 1 | Viewed by 657
Abstract
Environment perception is a key part of robot self-controlled motion. When using vision to accomplish obstacle detection tasks, it is difficult for deep learning methods to detect all obstacles due to complex environment and vision limitations, and it is difficult for traditional methods [...] Read more.
Environment perception is a key part of robot self-controlled motion. When using vision to accomplish obstacle detection tasks, it is difficult for deep learning methods to detect all obstacles due to complex environment and vision limitations, and it is difficult for traditional methods to meet real-time requirements when applied to embedded platforms. In this paper, a fast obstacle-detection process applied to RGB-D cameras is proposed. The process has three main steps, feature point extraction, noise removal, and obstacle clustering. Using Canny and Shi–Tomasi algorithms to complete the pre-processing and feature point extraction, filtering noise based on geometry, grouping obstacles with different depths based on the basic principle that the feature points on the same object contour must be continuous or within the same depth in the view of RGB-D camera, and then doing further segmentation from the horizontal direction to complete the obstacle clustering work. The method omits the iterative computation process required by traditional methods and greatly reduces the memory and time overhead. After experimental verification, the proposed method has a comprehensive recognition accuracy of 82.41%, which is 4.13% and 19.34% higher than that of RSC and traditional methods, respectively, and recognition accuracy of 91.72% under normal illumination, with a recognition speed of more than 20 FPS on the embedded platform; at the same time, all detections can be achieved within 1 m under normal illumination, and the detection error is no more than 2 cm within 3 m. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
An Infrared and Visible Image Fusion Algorithm Method Based on a Dual Bilateral Least Squares Hybrid Filter
Electronics 2023, 12(10), 2292; https://doi.org/10.3390/electronics12102292 - 18 May 2023
Cited by 2 | Viewed by 622
Abstract
Infrared and visible images of the same scene are fused to produce a fused image with richer information. However, most current image-fusion algorithms suffer from insufficient edge information retention, weak feature representation, and poor contrast, halos, and artifacts, and can only be applied [...] Read more.
Infrared and visible images of the same scene are fused to produce a fused image with richer information. However, most current image-fusion algorithms suffer from insufficient edge information retention, weak feature representation, and poor contrast, halos, and artifacts, and can only be applied to a single scene. To address these issues, we propose a novel infrared and visual image fusion algorithm based on a bilateral–least-squares hybrid filter (DBLSF) with the least-squares and bilateral filter hybrid model (BLF-LS). The proposed algorithm utilizes the residual network ResNet50 and the adaptive fusion strategy of the structure tensor to fuse the base and detail layers of the filter decomposition, respectively. Experiments on 32 sets of images from the TNO image-fusion dataset show that, although our fusion algorithm sacrifices overall time efficiency, the Combination 1 approach can better preserve image edge information and image integrity; reduce the loss of source image features; suppress artifacts and halos; and compare favorably with other algorithms in terms of structural similarity, feature similarity, multiscale structural similarity, root mean square error, peak signal-to-noise ratio, and correlation coefficient by at least 2.71%, 1.86%, 0.09%, 0.46%, 0.24%, and 0.07%; and the proposed Combination 2 can effectively improve the contrast and edge features of the fused image and enrich the image detail information, with an average improvement of 37.42%, 26.40%, and 26.60% in the three metrics of average gradient, edge intensity, and spatial frequency compared with other algorithms. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
An Efficient Strategy for Catastrophic Forgetting Reduction in Incremental Learning
Electronics 2023, 12(10), 2265; https://doi.org/10.3390/electronics12102265 - 17 May 2023
Viewed by 899
Abstract
Deep neural networks (DNNs) have made outstanding achievements in a wide variety of domains. For deep learning tasks, large enough datasets are required for training efficient DNN models. However, big datasets are not always available, and they are costly to build. Therefore, balanced [...] Read more.
Deep neural networks (DNNs) have made outstanding achievements in a wide variety of domains. For deep learning tasks, large enough datasets are required for training efficient DNN models. However, big datasets are not always available, and they are costly to build. Therefore, balanced solutions for DNN model efficiency and training data size have caught the attention of researchers recently. Transfer learning techniques are the most common for this. In transfer learning, a DNN model is pre-trained on a large enough dataset and then applied to a new task with modest data. This fine-tuning process yields another challenge, named catastrophic forgetting. However, it can be reduced using a reasonable strategy for data argumentation in incremental learning. In this paper, we propose an efficient solution for the random selection of samples from the old task to be incrementally stored for learning a sequence of new tasks. In addition, a loss combination strategy is also proposed for optimizing incremental learning. The proposed solutions are evaluated on standard datasets with two scenarios of incremental fine-tuning: (1) New Class (NC) dataset; (2) New Class and new Instance (NCI) dataset. The experimental results show that our proposed solution achieves outstanding results compared with other SOTA rehearsal methods, as well as traditional fine-tuning solutions, ranging from 1% to 16% in recognition accuracy. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Big-Volume SliceGAN for Improving a Synthetic 3D Microstructure Image of Additive-Manufactured TYPE 316L Steel
J. Imaging 2023, 9(5), 90; https://doi.org/10.3390/jimaging9050090 - 29 Apr 2023
Viewed by 1068
Abstract
A modified SliceGAN architecture was proposed to generate a high-quality synthetic three-dimensional (3D) microstructure image of TYPE 316L material manufactured through additive methods. The quality of the resulting 3D image was evaluated using an auto-correlation function, and it was discovered that maintaining a [...] Read more.
A modified SliceGAN architecture was proposed to generate a high-quality synthetic three-dimensional (3D) microstructure image of TYPE 316L material manufactured through additive methods. The quality of the resulting 3D image was evaluated using an auto-correlation function, and it was discovered that maintaining a high resolution while doubling the training image size was crucial in creating a more realistic synthetic 3D image. To meet this requirement, modified 3D image generator and critic architecture was developed within the SliceGAN framework. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Overcoming Adverse Conditions in Rescue Scenarios: A Deep Learning and Image Processing Approach
Appl. Sci. 2023, 13(9), 5499; https://doi.org/10.3390/app13095499 - 28 Apr 2023
Viewed by 874
Abstract
This paper presents a Deep Learning (DL) and Image-Processing (IP) pipeline that addresses exposure recovery in challenging lighting conditions for enhancing First Responders’ (FRs) Situational Awareness (SA) during rescue operations. The method aims to improve the quality of images captured by FRs, particularly [...] Read more.
This paper presents a Deep Learning (DL) and Image-Processing (IP) pipeline that addresses exposure recovery in challenging lighting conditions for enhancing First Responders’ (FRs) Situational Awareness (SA) during rescue operations. The method aims to improve the quality of images captured by FRs, particularly in overexposed and underexposed environments while providing a response time suitable for rescue scenarios. The paper describes the technical details of the pipeline, including exposure correction, segmentation, and fusion techniques. Our results demonstrate that the pipeline effectively recovers details in challenging lighting conditions, improves object detection, and is efficient in high-stress, fast-paced rescue situations. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Detection of Targets in Road Scene Images Enhanced Using Conditional GAN-Based Dehazing Model
Appl. Sci. 2023, 13(9), 5326; https://doi.org/10.3390/app13095326 - 24 Apr 2023
Viewed by 934
Abstract
Object detection is a classic image processing problem. For instance, in autonomous driving applications, targets such as cars and pedestrians are detected in the road scene video. Many image-based object detection methods utilizing hand-crafted features have been proposed. Recently, more research has adopted [...] Read more.
Object detection is a classic image processing problem. For instance, in autonomous driving applications, targets such as cars and pedestrians are detected in the road scene video. Many image-based object detection methods utilizing hand-crafted features have been proposed. Recently, more research has adopted a deep learning approach. Object detectors rely on useful features, such as the object’s boundary, which are extracted via analyzing the image pixels. However, the images captured, for instance, in an outdoor environment, may be degraded due to bad weather such as haze and fog. One possible remedy is to recover the image radiance through the use of a pre-processing method such as image dehazing. We propose a dehazing model for image enhancement. The framework was based on the conditional generative adversarial network (cGAN). Our proposed model was improved with two modifications. Various image dehazing datasets were employed for comparative analysis. Our proposed model outperformed other hand-crafted and deep learning-based image dehazing methods by 2dB or more in PSNR. Moreover, we utilized the dehazed images for target detection using the object detector YOLO. In the experimentations, images were degraded by two weather conditions—rain and fog. We demonstrated that the objects detected in images enhanced by our proposed dehazing model were significantly improved over those detected in the degraded images. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Unsupervised Image Enhancement Method Based on Attention Map Network Guidance and Attention Mechanism
Electronics 2023, 12(8), 1887; https://doi.org/10.3390/electronics12081887 - 17 Apr 2023
Viewed by 870
Abstract
Low-light image enhancement is a crucial preprocessing task in complex vision tasks. It directly impacts object detection, image segmentation, and image recognition outcomes. In recent years, with the continuous development of deep learning techniques, an increasing number of image enhancement methods based on [...] Read more.
Low-light image enhancement is a crucial preprocessing task in complex vision tasks. It directly impacts object detection, image segmentation, and image recognition outcomes. In recent years, with the continuous development of deep learning techniques, an increasing number of image enhancement methods based on deep learning have emerged. However, due to the high cost of data collection and the limited content of supervised learning datasets, more and more scholars have shifted their focus to the field of unsupervised image enhancement. Unsupervised image enhancement methods do not require paired images of the same scene during the training process, which greatly reduces the threshold for network training. Nevertheless, current unsupervised methods still suffer from issues such as unstable enhancement effects and limited generalization ability. To address these problems, we propose an improved low-light image enhancement method. The proposed method employs the LSGAN as the training architecture and utilizes an attention map network to dynamically generate attention maps that best fit the network enhancement task, which can effectively improve the generalization ability and enhancement performance of the network. Additionally, we adopt an attention mechanism to enhance the subtle details of the image features. Regarding the network training, considering that the traditional convolutional neural network discriminator may not provide effective guidance to the generator in the early stages of training, we propose an improved discriminator structure. The experimental results demonstrate that our method can achieve good enhancement performance on different datasets and has practical value. Although our method has advantages in enhancing low-light images, it also has certain limitations, such as the network size not meeting the requirements for lightweight models and the potential for further improvement under extremely low-light conditions. We will strive to address these issues as comprehensively as possible in our future research. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Research on Identification and Location of Charging Ports of Multiple Electric Vehicles Based on SFLDLC-CBAM-YOLOV7-Tinp-CTMA
Electronics 2023, 12(8), 1855; https://doi.org/10.3390/electronics12081855 - 14 Apr 2023
Viewed by 935
Abstract
With the gradual maturity of autonomous driving and automatic parking technology, electric vehicle charging is moving towards automation. The charging port (CP) location is an important basis for realizing automatic charging. Existing CP identification algorithms are only suitable for a single vehicle model [...] Read more.
With the gradual maturity of autonomous driving and automatic parking technology, electric vehicle charging is moving towards automation. The charging port (CP) location is an important basis for realizing automatic charging. Existing CP identification algorithms are only suitable for a single vehicle model with poor universality. Therefore, this paper proposes a set of methods that can identify the CPs of various vehicle types. The recognition process is divided into a rough positioning stage (RPS) and a precise positioning stage (PPS). In this study, the data sets corresponding to four types of vehicle CPs under different environments are established. In the RPS, the characteristic information of the CP is obtained based on the combination of convolutional block attention module (CBAM) and YOLOV7-tinp, and its position information is calculated using the similar projection relationship. For the PPS, this paper proposes a data enhancement method based on similar feature location to determine the label category (SFLDLC). The CBAM-YOLOV7-tinp is used to identify the feature location information, and the cluster template matching algorithm (CTMA) is used to obtain the accurate feature location and tag type, and the EPnP algorithm is used to calculate the location and posture (LP) information. The results of the LP solution are used to provide the position coordinates of the CP relative to the robot base. Finally, the AUBO-i10 robot is used to complete the experimental test. The corresponding results show that the average positioning errors (x, y, z, rx, ry, and rz) of the CP are 0.64 mm, 0.88 mm, 1.24 mm, 1.19 degrees, 1.00 degrees, and 0.57 degrees, respectively, and the integrated insertion success rate is 94.25%. Therefore, the algorithm proposed in this paper can efficiently and accurately identify and locate various types of CP and meet the actual plugging requirements. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Thangka Sketch Colorization Based on Multi-Level Adaptive-Instance-Normalized Color Fusion and Skip Connection Attention
Electronics 2023, 12(7), 1745; https://doi.org/10.3390/electronics12071745 - 06 Apr 2023
Viewed by 936
Abstract
Thangka is an important intangible cultural heritage of Tibet. Due to the complexity, and time-consuming nature of the Thangka painting technique, this technique is currently facing the risk of being lost. It is important to preserve the art of Thangka through digital painting [...] Read more.
Thangka is an important intangible cultural heritage of Tibet. Due to the complexity, and time-consuming nature of the Thangka painting technique, this technique is currently facing the risk of being lost. It is important to preserve the art of Thangka through digital painting methods. Machine learning-based auto-sketch colorization is one of the vital steps for digital Thangka painting. However, existing learning-based sketch colorization methods face two challenges in solving the problem of colorizing Thangka: (1) the extremely rich colors of the Thangka make it difficult to color accurately with existing algorithms, and (2) the line density of the Thangka brings extreme challenges for algorithms to define what semantic information the lines imply. To resolve these problems, we propose a Thangka sketch colorization method based on multi-level adaptive-instance-normalized color fusion (MACF) and skip connection attention (SCA). The proposed method consists of two parts: (1) a multi-level adaptive-instance-normalized color fusion (MACF) to fuse sketch feature and color feature; and (2) a skip connection attention (SCA) mechanism to distinguish the semantic information implied by the sketch lines. Experiments on colorizing Thangka sketches show that our method works well on two small datasets—the Danbooru 2019 dataset and the Thangka dataset. Our approach can generate exquisite Thangka. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Research on Improved Multi-Channel Image Stitching Technology Based on Fast Algorithms
Electronics 2023, 12(7), 1700; https://doi.org/10.3390/electronics12071700 - 03 Apr 2023
Cited by 1 | Viewed by 1353
Abstract
The image registration and fusion process of image stitching algorithms entails significant computational costs, and the use of robust stitching algorithms with good performance is limited in real-time applications on PCs (personal computers) and embedded systems. Fast image registration and fusion algorithms suffer [...] Read more.
The image registration and fusion process of image stitching algorithms entails significant computational costs, and the use of robust stitching algorithms with good performance is limited in real-time applications on PCs (personal computers) and embedded systems. Fast image registration and fusion algorithms suffer from problems such as ghosting and dashed lines, resulting in suboptimal display effects on the stitching. Consequently, this study proposes a multi-channel image stitching approach based on fast image registration and fusion algorithms, which enhances the stitching effect on the basis of fast algorithms, thereby augmenting its potential for deployment in real-time applications. First, in the image registration stage, the gridded Binary Robust Invariant Scalable Keypoints (BRISK) method was used to improve the matching efficiency of feature points, and the Grid-based Motion Statistics (GMS) algorithm with a bidirectional rough matching method was used to improve the matching accuracy of feature points. Then, the optimal seam algorithm was used in the image fusion stage to obtain the seam line and construct the fusion area. The seam and transition areas were fused using the fade-in and fade-out weighting algorithm to obtain smooth and high-quality stitched images. The experimental results demonstrate the performance of our proposed method through an improvement in image registration and fusion metrics. We compared our approach with both the original algorithm and other existing methods and achieved significant improvements in eliminating stitching artifacts such as ghosting and discontinuities while maintaining the efficiency of fast algorithms. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
A Self-Supervised Tree-Structured Framework for Fine-Grained Classification
Appl. Sci. 2023, 13(7), 4453; https://doi.org/10.3390/app13074453 - 31 Mar 2023
Viewed by 732
Abstract
In computer vision, fine-grained classification has become an important issue in recognizing objects with slight visual differences. Usually, it is challenging to generate good performance when solving fine-grained classification problems using traditional convolutional neural networks. To improve the accuracy and training time of [...] Read more.
In computer vision, fine-grained classification has become an important issue in recognizing objects with slight visual differences. Usually, it is challenging to generate good performance when solving fine-grained classification problems using traditional convolutional neural networks. To improve the accuracy and training time of convolutional neural networks in solving fine-grained classification problems, this paper proposes a tree-structured framework by eliminating the effect of differences between clusters. The contributions of the proposed method include the following three aspects: (1) a self-supervised method that automatically creates a classification tree, eliminating the need for manual labeling; (2) a machine-learning matcher which determines the cluster to which an item belongs, minimizing the impact of inter-cluster variations on classification; and (3) a pruning criterion which filters the tree-structured classifier, retaining only the models with superior classification performance. The experimental evaluation of the proposed tree-structured framework demonstrates its effectiveness in reducing training time and improving the accuracy of fine-grained classification across various datasets in comparison with conventional convolutional neural network models. Specifically, for the CUB 200 2011, FGVC aircraft, and Stanford car datasets, the proposed method achieves a reduction in training time of 32.91%, 35.87%, and 14.48%, and improves the accuracy of fine-grained classification by 1.17%, 2.01%, and 0.59%, respectively. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Pixel-Coordinate-Induced Human Pose High-Precision Estimation Method
Electronics 2023, 12(7), 1648; https://doi.org/10.3390/electronics12071648 - 31 Mar 2023
Viewed by 974
Abstract
Accurately estimating human pose is crucial for providing feedback during exercises or musical performances, but the complex and flexible nature of human joints makes it challenging. Additionally, traditional methods often neglect pixel coordinates, which are naturally present in high-resolution images of the human [...] Read more.
Accurately estimating human pose is crucial for providing feedback during exercises or musical performances, but the complex and flexible nature of human joints makes it challenging. Additionally, traditional methods often neglect pixel coordinates, which are naturally present in high-resolution images of the human body. To address this issue, we propose a novel human pose estimation method that directly incorporates pixel coordinates. Our method adds a coordinate channel to the convolution process and embeds pixel coordinates into the feature map, while also using coordinate attention to capture position- and structure-sensitive features. We further reduce the network parameters and computational cost by using small-scale convolution kernels and a smooth activation function in residual blocks. We evaluate our model on the MPII Human Pose and COCO Keypoint Detection datasets and demonstrate improved accuracy, highlighting the importance of directly incorporating coordinate location information in position-sensitive tasks. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
A Genetic Algorithm Based One Class Support Vector Machine Model for Arabic Skilled Forgery Signature Verification
J. Imaging 2023, 9(4), 79; https://doi.org/10.3390/jimaging9040079 - 29 Mar 2023
Viewed by 1728
Abstract
Recently, signature verification systems have been widely adopted for verifying individuals based on their handwritten signatures, especially in forensic and commercial transactions. Generally, feature extraction and classification tremendously impact the accuracy of system authentication. Feature extraction is challenging for signature verification systems due [...] Read more.
Recently, signature verification systems have been widely adopted for verifying individuals based on their handwritten signatures, especially in forensic and commercial transactions. Generally, feature extraction and classification tremendously impact the accuracy of system authentication. Feature extraction is challenging for signature verification systems due to the diverse forms of signatures and sample circumstances. Current signature verification techniques demonstrate promising results in identifying genuine and forged signatures. However, the overall performance of skilled forgery detection remains rigid to deliver high contentment. Furthermore, most of the current signature verification techniques demand a large number of learning samples to increase verification accuracy. This is the primary disadvantage of using deep learning, as the figure of signature samples is mainly restricted to the functional application of the signature verification system. In addition, the system inputs are scanned signatures that comprise noisy pixels, a complicated background, blurriness, and contrast decay. The main challenge has been attaining a balance between noise and data loss, since some essential information is lost during preprocessing, probably influencing the subsequent stages of the system. This paper tackles the aforementioned issues by presenting four main steps: preprocessing, multifeature fusion, discriminant feature selection using a genetic algorithm based on one class support vector machine (OCSVM-GA), and a one-class learning strategy to address imbalanced signature data in the practical application of a signature verification system. The suggested method employs three databases of signatures: SID-Arabic handwritten signatures, CEDAR, and UTSIG. Experimental results depict that the proposed approach outperforms current systems in terms of false acceptance rate (FAR), false rejection rate (FRR), and equal error rate (EER). Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
DFEN: Dual Feature Enhancement Network for Remote Sensing Image Caption
Electronics 2023, 12(7), 1547; https://doi.org/10.3390/electronics12071547 - 25 Mar 2023
Viewed by 694
Abstract
The remote sensing image caption can acquire ground objects and the semantic relationships between different ground objects. Existing remote sensing image caption algorithms do not acquire enough ground object information from remote-sensing images, resulting in inaccurate captions. As a result, this paper proposes [...] Read more.
The remote sensing image caption can acquire ground objects and the semantic relationships between different ground objects. Existing remote sensing image caption algorithms do not acquire enough ground object information from remote-sensing images, resulting in inaccurate captions. As a result, this paper proposes a codec-based Dual Feature Enhancement Network (“DFEN”) to enhance ground object information from both image and text levels. We build the Image-Enhancement module at the image level using the multiscale characteristics of remote sensing images. Furthermore, more discriminative image context features are obtained through the Image-Enhancement module. The hierarchical attention mechanism aggregates multi-level features and supplements the ground object information ignored due to large-scale differences. At the text level, we use the image’s potential visual features to guide the Text-Enhance module, resulting in text guidance features that correctly focus on the information of the ground objects. Experiment results show that the DFEN model can enhance ground object information from images and text. Specifically, the BLEU-1 index increased by 8.6% in UCM-caption, 2.3% in Sydney-caption, and 5.1% in RSICD. The DFEN model has promoted the exploration of advanced semantics of remote sensing images and facilitated the development of remote sensing image caption. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Enhancing Image Encryption with the Kronecker xor Product, the Hill Cipher, and the Sigmoid Logistic Map
Appl. Sci. 2023, 13(6), 4034; https://doi.org/10.3390/app13064034 - 22 Mar 2023
Cited by 3 | Viewed by 1388
Abstract
In today’s digital age, it is crucial to secure the flow of information to protect data and information from being hacked during transmission or storage. To address this need, we present a new image encryption technique that combines the Kronecker xor product, Hill [...] Read more.
In today’s digital age, it is crucial to secure the flow of information to protect data and information from being hacked during transmission or storage. To address this need, we present a new image encryption technique that combines the Kronecker xor product, Hill cipher, and sigmoid logistic Map. Our proposed algorithm begins by shifting the values in each row of the state matrix to the left by a predetermined number of positions, then encrypting the resulting image using the Hill Cipher. The top value of each odd or even column is used to perform an xor operation with all values in the corresponding even or odd column, excluding the top value. The resulting image is then diffused using a sigmoid logistic map and subjected to the Kronecker xor product operation among the pixels to create a secure image. The image is then diffused again with other keys from the sigmoid logistic map for the final product. We compared our proposed method to recent work and found it to be safe and efficient in terms of performance after conducting statistical analysis, differential attack analysis, brute force attack analysis, and information entropy analysis. The results demonstrate that our proposed method is robust, lightweight, and fast in performance, meets the requirements for encryption and decryption, and is resistant to various attacks. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Fish Detection and Classification for Automatic Sorting System with an Optimized YOLO Algorithm
Appl. Sci. 2023, 13(6), 3812; https://doi.org/10.3390/app13063812 - 16 Mar 2023
Cited by 1 | Viewed by 2283
Abstract
Automatic fish recognition using deep learning and computer or machine vision is a key part of making the fish industry more productive through automation. An automatic sorting system will help to tackle the challenges of increasing food demand and the threat of food [...] Read more.
Automatic fish recognition using deep learning and computer or machine vision is a key part of making the fish industry more productive through automation. An automatic sorting system will help to tackle the challenges of increasing food demand and the threat of food scarcity in the future due to the continuing growth of the world population and the impact of global warming and climate change. As far as the authors know, there has been no published work so far to detect and classify moving fish for the fish culture industry, especially for automatic sorting purposes based on the fish species using deep learning and machine vision. This paper proposes an approach based on the recognition algorithm YOLOv4, optimized with a unique labeling technique. The proposed method was tested with videos of real fish running on a conveyor, which were put randomly in position and order at a speed of 505.08 m/h and could obtain an accuracy of 98.15%. This study with a simple but effective method is expected to be a guide for automatically detecting, classifying, and sorting fish. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Color Constancy Based on Local Reflectance Differences
Electronics 2023, 12(6), 1396; https://doi.org/10.3390/electronics12061396 - 15 Mar 2023
Cited by 1 | Viewed by 960
Abstract
Color constancy is used to determine the actual surface color of the scene affected by illumination so that the captured image is more in line with the characteristics of human perception. The well-known Gray-Edge hypothesis states that the average edge difference in a [...] Read more.
Color constancy is used to determine the actual surface color of the scene affected by illumination so that the captured image is more in line with the characteristics of human perception. The well-known Gray-Edge hypothesis states that the average edge difference in a scene is achromatic. Inspired by the Gray-Edge hypothesis, we propose a new illumination estimation method. Specifically, after analyzing three public datasets containing rich illumination conditions and scenes, we found that the ratio of the global sum of reflectance differences to the global sum of locally normalized reflectance differences is achromatic. Based on this hypothesis, we also propose an accurate color constancy method. The method was tested on four test datasets containing various illumination conditions (three datasets in a single-light environment and one dataset in a multi-light environment). The results show that the proposed method outperforms the state-of-the-art color constancy methods. Furthermore, we propose a new framework that can incorporate current mainstream statistics-based color constancy methods (Gray-World, Max-RGB, Gray-Edge, etc.) into the proposed framework. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Visual Attention Adversarial Networks for Chinese Font Translation
Electronics 2023, 12(6), 1388; https://doi.org/10.3390/electronics12061388 - 14 Mar 2023
Viewed by 1091
Abstract
Currently, many Chinese font translation models adopt the method of dividing character components to improve the quality of generated font images. However, character components require a large amount of manual annotation to decompose characters and determine the composition of each character as input [...] Read more.
Currently, many Chinese font translation models adopt the method of dividing character components to improve the quality of generated font images. However, character components require a large amount of manual annotation to decompose characters and determine the composition of each character as input for training. In this paper, we establish a Chinese font translation model based on generative adversarial network without decomposition. First, we improve the method of image enhancement for Chinese character images. It helps the model learning structure information of Chinese character strokes to generate font images with complete and accurate strokes. Second, we propose a visual attention adversarial network. By using visual attention block, the network catches global and local features for constructing details of characters. Experiments demonstrate our method generates high-quality Chinese character images with great style diversity including calligraphy characters. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
A Real-Time Registration Algorithm of UAV Aerial Images Based on Feature Matching
J. Imaging 2023, 9(3), 67; https://doi.org/10.3390/jimaging9030067 - 11 Mar 2023
Viewed by 1476
Abstract
This study aimed to achieve the accurate and real-time geographic positioning of UAV aerial image targets. We verified a method of registering UAV camera images on a map (with the geographic location) through feature matching. The UAV is usually in rapid motion and [...] Read more.
This study aimed to achieve the accurate and real-time geographic positioning of UAV aerial image targets. We verified a method of registering UAV camera images on a map (with the geographic location) through feature matching. The UAV is usually in rapid motion and involves changes in the camera head, and the map is high-resolution and has sparse features. These reasons make it difficult for the current feature-matching algorithm to accurately register the two (camera image and map) in real time, meaning that there will be a large number of mismatches. To solve this problem, we used the SuperGlue algorithm, which has a better performance, to match the features. The layer and block strategy, combined with the prior data of the UAV, was introduced to improve the accuracy and speed of feature matching, and the matching information obtained between frames was introduced to solve the problem of uneven registration. Here, we propose the concept of updating map features with UAV image features to enhance the robustness and applicability of UAV aerial image and map registration. After numerous experiments, it was proved that the proposed method is feasible and can adapt to the changes in the camera head, environment, etc. The UAV aerial image is stably and accurately registered on the map, and the frame rate reaches 12 frames per second, which provides a basis for the geo-positioning of UAV aerial image targets. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Article
Night Vision Anti-Halation Algorithm Based on Different-Source Image Fusion Combining Visual Saliency with YUV-FNSCT
Electronics 2023, 12(6), 1303; https://doi.org/10.3390/electronics12061303 - 09 Mar 2023
Viewed by 1104
Abstract
In order to address driver’s dazzle caused by the abuse of high beams when vehicles meet at night, a night vision anti-halation algorithm based on image fusion combining visual saliency with YUV-FNSCT is proposed. Improved Frequency-turned (FT) visual saliency detection is proposed to [...] Read more.
In order to address driver’s dazzle caused by the abuse of high beams when vehicles meet at night, a night vision anti-halation algorithm based on image fusion combining visual saliency with YUV-FNSCT is proposed. Improved Frequency-turned (FT) visual saliency detection is proposed to quickly lock on the objects of interest, such as vehicles and pedestrians, so as to improve the salient features of fusion images. The high- and low-frequency sub-bands of infrared saliency images and visible luminance components can quickly be obtained using fast non-subsampled contourlet transform (FNSCT), which has the characteristics of multi-direction, multi-scale, and shift-invariance. According to the halation degree in the visible image, the nonlinear adaptive fusion strategy of low-frequency weight reasonably eliminates halation while retaining useful information from the original image to the maximum extent. The statistical matching feature fusion strategy distinguishes the common and unique edge information from the high-frequency sub-bands by mutual matching so as to obtain more effective details of the original images such as the edges and contours. Only the luminance Y decomposed by YUV transform is involved in image fusion, which not only avoids color shift of the fusion image but also reduces the amount of computation. Considering the night driving environment and the degree of halation, the visible images and infrared images were collected for anti-halation fusion in six typical halation scenes on three types of roads covering most night driving conditions. The fused images obtained by the proposed algorithm demonstrate complete halation elimination, rich color details, and obvious salient features and have the best comprehensive index in each halation scene. The experimental results and analysis show that the proposed algorithm has advantages in halation elimination and visual saliency and has good universality for different night vision halation scenes, which help drivers to observe the road ahead and improve the safety of night driving. It also has certain applicability to rainy, foggy, smoggy, and other complex weather. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1