Topic Editors

Department of Engineering (DING), University of Sannio, Benevento, Italy
Department of Computer Science, Royal Holloway, University of London, Surrey TW20 0EX, UK

Computer Vision and Image Processing, 2nd Edition

Abstract submission deadline
30 September 2024
Manuscript submission deadline
31 December 2024
Viewed by
11642

Topic Information

Dear Colleagues,

The field of computer vision and image processing has advanced significantly in recent years, with new techniques and applications emerging constantly. Building on the success of our first edition, we are pleased to announce a second edition on this exciting topic. We invite researchers, academics, and practitioners to submit original research articles, reviews, or case studies that address the latest developments in computer vision and image processing. Topics of interest include but are not limited to:

  • Deep learning for image classification and recognition
  • Object detection and tracking
  • Image segmentation and analysis
  • 3D reconstruction and modeling
  • Image and video compression
  • Image enhancement and restoration
  • Medical image processing and analysis
  • Augmented and virtual reality

Submissions should be original and should not have been published or submitted elsewhere. All papers will be peer-reviewed by at least two experts in the field, and accepted papers will be published together on the topic website. To submit your paper, please visit the journal's website and follow the submission guidelines. For any queries, please contact the guest editors of the topic.

We look forward to receiving your submissions and sharing the latest advancements in computer vision and image processing with our readers.

Prof. Silvia Liberata Ullo
Prof. Dr. Li Zhang
Topic Editors

Keywords

  • 3D acquisition, processing, and visualization
  • scene understanding
  • multimodal sensor processing and fusion
  • multispectral, color, and greyscale image processing
  • industrial quality inspection
  • computer vision for robotics
  • computer vision for surveillance
  • airborne and satellite on-board image acquisition platforms
  • computational models of vision
  • imaging psychophysics

Participating Journals

Journal Name Impact Factor CiteScore Launched Year First Decision (median) APC
Applied Sciences
applsci
2.7 4.5 2011 16.9 Days CHF 2400 Submit
Electronics
electronics
2.9 4.7 2012 15.6 Days CHF 2400 Submit
Journal of Imaging
jimaging
3.2 4.4 2015 21.7 Days CHF 1800 Submit
Mathematics
mathematics
2.4 3.5 2013 16.9 Days CHF 2600 Submit
Remote Sensing
remotesensing
5.0 7.9 2009 23 Days CHF 2700 Submit

Preprints.org is a multidiscipline platform providing preprint service that is dedicated to sharing your research from the start and empowering your research journey.

MDPI Topics is cooperating with Preprints.org and has built a direct connection between MDPI journals and Preprints.org. Authors are encouraged to enjoy the benefits by posting a preprint at Preprints.org prior to publication:

  1. Immediately share your ideas ahead of publication and establish your research priority;
  2. Protect your idea from being stolen with this time-stamped preprint article;
  3. Enhance the exposure and impact of your research;
  4. Receive feedback from your peers in advance;
  5. Have it indexed in Web of Science (Preprint Citation Index), Google Scholar, Crossref, SHARE, PrePubMed, Scilit and Europe PMC.

Published Papers (16 papers)

Order results
Result details
Journals
Select all
Export citation of selected articles as:
12 pages, 5442 KiB  
Article
Image Enhancement of Steel Plate Defects Based on Generative Adversarial Networks
by Zhideng Jie, Hong Zhang, Kaixuan Li, Xiao Xie and Aopu Shi
Electronics 2024, 13(11), 2013; https://doi.org/10.3390/electronics13112013 - 22 May 2024
Viewed by 285
Abstract
In this study, the problem of a limited number of data samples, which affects the detection accuracy, arises for the image classification task of steel plate surface defects under conditions of small sample sizes. A data enhancement method based on generative adversarial networks [...] Read more.
In this study, the problem of a limited number of data samples, which affects the detection accuracy, arises for the image classification task of steel plate surface defects under conditions of small sample sizes. A data enhancement method based on generative adversarial networks is proposed. The method introduces a two-way attention mechanism, which is specifically designed to improve the model’s ability to identify weak defects and optimize the model structure of the network discriminator, which augments the model’s capacity to perceive the overall details of the image and effectively improves the intricacy and authenticity of the generated images. By enhancing the two original datasets, the experimental results show that the proposed method improves the average accuracy by 8.5% across the four convolutional classification models. The results demonstrate the superior detection accuracy of the proposed method, improving the classification of steel plate surface defects. Full article
(This article belongs to the Topic Computer Vision and Image Processing, 2nd Edition)
Show Figures

Figure 1

18 pages, 2977 KiB  
Article
Feature Maps Need More Attention: A Spatial-Channel Mutual Attention-Guided Transformer Network for Face Super-Resolution
by Zhe Zhang and Chun Qi
Appl. Sci. 2024, 14(10), 4066; https://doi.org/10.3390/app14104066 - 10 May 2024
Viewed by 349
Abstract
Recently, transformer-based face super-resolution (FSR) approaches have achieved promising success in restoring degraded facial details due to their high capability for capturing both local and global dependencies. However, while existing methods focus on introducing sophisticated structures, they neglect the potential feature map information, [...] Read more.
Recently, transformer-based face super-resolution (FSR) approaches have achieved promising success in restoring degraded facial details due to their high capability for capturing both local and global dependencies. However, while existing methods focus on introducing sophisticated structures, they neglect the potential feature map information, limiting FSR performance. To circumvent this problem, we carefully design a pair of guiding blocks to dig for possible feature map information to enhance features before feeding them to transformer blocks. Relying on the guiding blocks, we propose a spatial-channel mutual attention-guided transformer network for FSR, for which the backbone architecture is a multi-scale connected encoder–decoder. Specifically, we devise a novel Spatial-Channel Mutual Attention-guided Transformer Module (SCATM), which is composed of a Spatial-Channel Mutual Attention Guiding Block (SCAGB) and a Channel-wise Multi-head Transformer Block (CMTB). SCATM on the top layer (SCATM-T) aims to promote both local facial details and global facial structures, while SCATM on the bottom layer (SCATM-B) seeks to optimize the encoded features. Considering that different scale features are complementary, we further develop a Multi-scale Feature Fusion Module (MFFM), which fuses features from different scales for better restoration performance. Quantitative and qualitative experimental results on various datasets indicate that the proposed method outperforms other state-of-the-art FSR methods. Full article
(This article belongs to the Topic Computer Vision and Image Processing, 2nd Edition)
Show Figures

Figure 1

18 pages, 8838 KiB  
Article
Salient Object Detection via Fusion of Multi-Visual Perception
by Wenjun Zhou, Tianfei Wang, Xiaoqin Wu, Chenglin Zuo, Yifan Wang, Quan Zhang and Bo Peng
Appl. Sci. 2024, 14(8), 3433; https://doi.org/10.3390/app14083433 - 18 Apr 2024
Viewed by 475
Abstract
Salient object detection aims to distinguish the most visually conspicuous regions, playing an important role in computer vision tasks. However, complex natural scenarios can challenge salient object detection, hindering accurate extraction of objects with rich morphological diversity. This paper proposes a novel method [...] Read more.
Salient object detection aims to distinguish the most visually conspicuous regions, playing an important role in computer vision tasks. However, complex natural scenarios can challenge salient object detection, hindering accurate extraction of objects with rich morphological diversity. This paper proposes a novel method for salient object detection leveraging multi-visual perception, mirroring the human visual system’s rapid identification, and focusing on impressive objects/regions within complex scenes. First, a feature map is derived from the original image. Then, salient object detection results are obtained for each perception feature and combined via a feature fusion strategy to produce a saliency map. Finally, superpixel segmentation is employed for precise salient object extraction, removing interference areas. This multi-feature approach for salient object detection harnesses complementary features to adapt to complex scenarios. Competitive experiments on the MSRA10K and ECSSD datasets place our method in the first tier, achieving 0.1302 MAE and 0.9382 F-measure for the MSRA10K dataset and 0.0783 MAE and and 0.9635 F-measure for the ECSSD dataset, demonstrating superior salient object detection performance in complex natural scenarios. Full article
(This article belongs to the Topic Computer Vision and Image Processing, 2nd Edition)
Show Figures

Figure 1

13 pages, 3442 KiB  
Article
MDP-SLAM: A Visual SLAM towards a Dynamic Indoor Scene Based on Adaptive Mask Dilation and Dynamic Probability
by Xiaofeng Zhang and Zhengyang Shi
Electronics 2024, 13(8), 1497; https://doi.org/10.3390/electronics13081497 - 15 Apr 2024
Viewed by 523
Abstract
Visual simultaneous localization and mapping (SLAM) algorithms in dynamic scenes will apply the moving feature points to the camera pose’s calculation, which will cause the continuous accumulation of errors. As a target-detection tool, mask R-CNN, which is often used in combination with the [...] Read more.
Visual simultaneous localization and mapping (SLAM) algorithms in dynamic scenes will apply the moving feature points to the camera pose’s calculation, which will cause the continuous accumulation of errors. As a target-detection tool, mask R-CNN, which is often used in combination with the former, due to the limited training datasets, easily results in the semantic mask being incomplete and deformed, which will increase the error. In order to solve the above problems, we propose in this paper a visual SLAM algorithm based on an adaptive mask dilation strategy and the dynamic probability of the feature points, named MDP-SLAM. Firstly, we use the mask R-CNN target-detection algorithm to obtain the initial mask of the dynamic target. On this basis, an adaptive mask-dilation algorithm is used to obtain a mask that can completely cover the dynamic target and part of the surrounding scene. Then, we use the K-means clustering algorithm to segment the depth image information in the mask coverage area into absolute dynamic regions and relative dynamic regions. Combined with the epipolar constraint and the semantic constraint, the dynamic probability of the feature points is calculated, and then, the highly dynamic possible feature points are removed to solve an accurate final pose of the camera. Finally, the method is tested on the TUM RGB-D dataset. The results show that the MDP-SLAM algorithm proposed in this paper can effectively improve the accuracy of attitude estimation and has high accuracy and robustness in dynamic indoor scenes. Full article
(This article belongs to the Topic Computer Vision and Image Processing, 2nd Edition)
Show Figures

Figure 1

20 pages, 7943 KiB  
Article
Pushing the Boundaries of Solar Panel Inspection: Elevated Defect Detection with YOLOv7-GX Technology
by Yin Wang, Jingyong Zhao, Yihua Yan, Zhicheng Zhao and Xiao Hu
Electronics 2024, 13(8), 1467; https://doi.org/10.3390/electronics13081467 - 12 Apr 2024
Viewed by 453
Abstract
During the maintenance and management of solar photovoltaic (PV) panels, how to efficiently solve the maintenance difficulties becomes a key challenge that restricts their performance and service life. Aiming at the multi-defect-recognition challenge in PV-panel image analysis, this study innovatively proposes a new [...] Read more.
During the maintenance and management of solar photovoltaic (PV) panels, how to efficiently solve the maintenance difficulties becomes a key challenge that restricts their performance and service life. Aiming at the multi-defect-recognition challenge in PV-panel image analysis, this study innovatively proposes a new algorithm for the defect detection of PV panels incorporating YOLOv7-GX technology. The algorithm first constructs an innovative GhostSlimFPN network architecture by introducing GSConv and depth-wise separable convolution technologies, optimizing the traditional neck network structure. Then, a customized 1 × 1 convolutional module incorporating the GAM (Global Attention Mechanism) attention mechanism is designed in this paper to improve the ELAN structure, aiming to enhance the network’s perception and representation capabilities while controlling the network complexity. In addition, the XIOU loss function is introduced in the study to replace the traditional CIOU loss function, which effectively improves the robustness and convergence efficiency of the model. In the training stage, the sample imbalance problem is effectively solved by implementing differentiated weight allocations for different images and categories, which promotes the balance of the training process. The experimental data show that the optimized model achieves 94.8% in the highest mAP value, which is 6.4% higher than the original YOLOv7 network, significantly better than other existing models, and provides solid theoretical and technical support for further research and application in the field of PV-panel defect detection. Full article
(This article belongs to the Topic Computer Vision and Image Processing, 2nd Edition)
Show Figures

Figure 1

28 pages, 14693 KiB  
Article
Wildlife Real-Time Detection in Complex Forest Scenes Based on YOLOv5s Deep Learning Network
by Zhibin Ma, Yanqi Dong, Yi Xia, Delong Xu, Fu Xu and Feixiang Chen
Remote Sens. 2024, 16(8), 1350; https://doi.org/10.3390/rs16081350 - 11 Apr 2024
Viewed by 786
Abstract
With the progressively deteriorating global ecological environment and the gradual escalation of human activities, the survival of wildlife has been severely impacted. Hence, a rapid, precise, and reliable method for detecting wildlife holds immense significance in safeguarding their existence and monitoring their status. [...] Read more.
With the progressively deteriorating global ecological environment and the gradual escalation of human activities, the survival of wildlife has been severely impacted. Hence, a rapid, precise, and reliable method for detecting wildlife holds immense significance in safeguarding their existence and monitoring their status. However, due to the rare and concealed nature of wildlife activities, the existing wildlife detection methods face limitations in efficiently extracting features during real-time monitoring in complex forest environments. These models exhibit drawbacks such as slow speed and low accuracy. Therefore, we propose a novel real-time monitoring model called WL-YOLO, which is designed for lightweight wildlife detection in complex forest environments. This model is built upon the deep learning model YOLOv5s. In WL-YOLO, we introduce a novel and lightweight feature extraction module. This module is comprised of a deeply separable convolutional neural network integrated with compression and excitation modules in the backbone network. This design is aimed at reducing the number of model parameters and computational requirements, while simultaneously enhancing the feature representation of the network. Additionally, we introduced a CBAM attention mechanism to enhance the extraction of local key features, resulting in improved performance of WL-YOLO in the natural environment where wildlife has high concealment and complexity. This model achieved a mean accuracy (mAP) value of 97.25%, an F1-score value of 95.65%, and an accuracy value of 95.14%. These results demonstrated that this model outperforms the current mainstream deep learning models. Additionally, compared to the YOLOv5m base model, WL-YOLO reduces the number of parameters by 44.73% and shortens the detection time by 58%. This study offers technical support for detecting and protecting wildlife in intricate environments by introducing a highly efficient and advanced wildlife detection model. Full article
(This article belongs to the Topic Computer Vision and Image Processing, 2nd Edition)
Show Figures

Graphical abstract

18 pages, 687 KiB  
Article
MHDNet: A Multi-Scale Hybrid Deep Learning Model for Person Re-Identification
by Jinghui Wang and Jun Wang
Electronics 2024, 13(8), 1435; https://doi.org/10.3390/electronics13081435 - 10 Apr 2024
Viewed by 489
Abstract
The primary objective of person re-identification is to identify individuals from surveillance videos across various scenarios. Conventional pedestrian recognition models typically employ convolutional neural network (CNN) and vision transformer (ViT) networks to extract features, and while CNNs are adept at extracting local features [...] Read more.
The primary objective of person re-identification is to identify individuals from surveillance videos across various scenarios. Conventional pedestrian recognition models typically employ convolutional neural network (CNN) and vision transformer (ViT) networks to extract features, and while CNNs are adept at extracting local features through convolution operations, capturing global information can be challenging, especially when dealing with high-resolution images. In contrast, ViT rely on cascaded self-attention modules to capture long-range feature dependencies, sacrificing local feature details. In light of these limitations, this paper presents the MHDNet, a hybrid network structure for pedestrian recognition that combines convolutional operations and self-attention mechanisms to enhance representation learning. The MHDNet is built around the Feature Fusion Module (FFM), which harmonizes global and local features at different resolutions. With a parallel structure, the MHDNet model maximizes the preservation of local features and global representations. Experiments on two person re-identification datasets demonstrate the superiority of the MHDNet over other state-of-the-art methods. Full article
(This article belongs to the Topic Computer Vision and Image Processing, 2nd Edition)
Show Figures

Figure 1

11 pages, 1269 KiB  
Article
Hybrid-Margin Softmax for the Detection of Trademark Image Similarity
by Chenyang Wang, Guangyuan Zheng and Hongtao Shan
Appl. Sci. 2024, 14(7), 2865; https://doi.org/10.3390/app14072865 - 28 Mar 2024
Viewed by 444
Abstract
The detection of image similarity is critical to trademark (TM) legal registration and court judgment on infringement cases. Meanwhile, there are great challenges regarding the annotation of similar pairs and model generalization on rapidly growing data when deep learning is introduced into the [...] Read more.
The detection of image similarity is critical to trademark (TM) legal registration and court judgment on infringement cases. Meanwhile, there are great challenges regarding the annotation of similar pairs and model generalization on rapidly growing data when deep learning is introduced into the task. The research idea of metric learning is naturally suited for the task where similarity of input is given instead of classification, but current methods are not targeted at the task and should be upgraded. To address these issues, loss-driven model training is introduced, and a hybrid-margin softmax (HMS) is proposed exactly based on the peculiarity of TM images. Two additive penalty margins are attached to the softmax to expand the decision boundary and develop greater tolerance for slight differences between similar TM images. With the HMS, a Siamese neural network (SNN) as the feature extractor is further penalized and the discrimination ability is improved. Experiments demonstrate that the detection model trained on HMS can make full use of small numbers of training data and has great discrimination ability on bigger quantities of test data. Meanwhile, the model can reach high performance with less depth of SNN. Extensive experiments indicate that the HMS-driven model trained completely on TM data generalized well on the face recognition (FR) task, which involves another type of image data. Full article
(This article belongs to the Topic Computer Vision and Image Processing, 2nd Edition)
Show Figures

Figure 1

25 pages, 8266 KiB  
Article
Infrared Small Target Detection Based on Tensor Tree Decomposition and Self-Adaptive Local Prior
by Guiyu Zhang, Zhenyu Ding, Qunbo Lv, Baoyu Zhu, Wenjian Zhang, Jiaao Li and Zheng Tan
Remote Sens. 2024, 16(6), 1108; https://doi.org/10.3390/rs16061108 - 21 Mar 2024
Viewed by 691
Abstract
Infrared small target detection plays a crucial role in both military and civilian systems. However, current detection methods face significant challenges in complex scenes, such as inaccurate background estimation, inability to distinguish targets from similar non-target points, and poor robustness across various scenes. [...] Read more.
Infrared small target detection plays a crucial role in both military and civilian systems. However, current detection methods face significant challenges in complex scenes, such as inaccurate background estimation, inability to distinguish targets from similar non-target points, and poor robustness across various scenes. To address these issues, this study presents a novel spatial–temporal tensor model for infrared small target detection. In our method, we introduce the tensor tree rank to capture global structure in a more balanced strategy, which helps achieve more accurate background estimation. Meanwhile, we design a novel self-adaptive local prior weight by evaluating the level of clutter and noise content in the image. It mitigates the imbalance between target enhancement and background suppression. Then, the spatial–temporal total variation (STTV) is used as a joint regularization term to help better remove noise and obtain better detection performance. Finally, the proposed model is efficiently solved by the alternating direction multiplier method (ADMM). Extensive experiments demonstrate that our method achieves superior detection performance when compared with other state-of-the-art methods in terms of target enhancement, background suppression, and robustness across various complex scenes. Furthermore, we conduct an ablation study to validate the effectiveness of each module in the proposed model. Full article
(This article belongs to the Topic Computer Vision and Image Processing, 2nd Edition)
Show Figures

Figure 1

18 pages, 5507 KiB  
Article
Research on Coaxiality Measurement Method for Automobile Brake Piston Components Based on Machine Vision
by Qinghua Li, Weinan Ge, Hu Shi, Wanting Zhao and Shihong Zhang
Appl. Sci. 2024, 14(6), 2371; https://doi.org/10.3390/app14062371 - 11 Mar 2024
Viewed by 428
Abstract
Aiming at addressing the problem of the online detection of automobile brake piston components, a non-contact measurement method based on the combination of machine vision and image processing technology is proposed. Firstly, an industrial camera is used to capture an image, and a [...] Read more.
Aiming at addressing the problem of the online detection of automobile brake piston components, a non-contact measurement method based on the combination of machine vision and image processing technology is proposed. Firstly, an industrial camera is used to capture an image, and a series of image preprocessing algorithms is used to extract a clear contour of a test piece with a unit pixel width. Secondly, based on the structural characteristics of automobile brake piston components, the region of interest is extracted, and the test piece is segmented into spring region and cylinder region. Then, based on mathematical morphology techniques, the edges of the image are optimized. We extract geometric feature points by comparing the heights of adjacent pixel points on both sides of the pixel points, so as to calculate the variation of the spring axis relative to the reference axis (centerline of the cylinder). Then, we extract the maximum variation from all images, and calculate the coaxiality error value using this maximum variation. Finally, we validate the feasibility of the proposed method and the stability of extracting geometric feature points through experiments. The experiments demonstrate the feasibility of the method in engineering practice, with the stability in extracting geometric feature points reaching 99.25%. Additionally, this method offers a new approach and perspective for coaxiality measurement of stepped shaft parts. Full article
(This article belongs to the Topic Computer Vision and Image Processing, 2nd Edition)
Show Figures

Figure 1

19 pages, 11826 KiB  
Article
A Convolution with Transformer Attention Module Integrating Local and Global Features for Object Detection in Remote Sensing Based on YOLOv8n
by Kaiqi Lang, Jie Cui, Mingyu Yang, Hanyu Wang, Zilong Wang and Honghai Shen
Remote Sens. 2024, 16(5), 906; https://doi.org/10.3390/rs16050906 - 4 Mar 2024
Cited by 2 | Viewed by 1303
Abstract
Object detection in remote sensing scenarios plays an indispensable and significant role in civilian, commercial, and military areas, leveraging the power of convolutional neural networks (CNNs). Remote sensing images, captured by crafts and satellites, exhibit unique characteristics including complicated backgrounds, limited features, distinct [...] Read more.
Object detection in remote sensing scenarios plays an indispensable and significant role in civilian, commercial, and military areas, leveraging the power of convolutional neural networks (CNNs). Remote sensing images, captured by crafts and satellites, exhibit unique characteristics including complicated backgrounds, limited features, distinct density, and varied scales. The contextual and comprehensive information in an image can make a detector precisely localize and classify targets, which is extremely valuable for object detection in remote sensing scenarios. However, CNNs, restricted by the essence of the convolution operation, possess local receptive fields and scarce contextual information, even in large models. To address this limitation and improve detection performance by extracting global contextual information, we propose a novel plug-and-play attention module, named Convolution with Transformer Attention Module (CTAM). CTAM is composed of a convolutional bottleneck block and a simplified Transformer layer, which can facilitate the integration of local features and position information with long-range dependency. YOLOv8n, a superior and faster variant of the YOLO series, is selected as the baseline. To demonstrate the effectiveness and efficiency of CTAM, we incorporated CTAM into YOLOv8n and conducted extensive experiments on the DIOR dataset. YOLOv8n-CTAM achieves an impressive 54.2 mAP@50-95, surpassing YOLOv8n (51.4) by a large margin. Notably, it outperforms the baseline by 2.7 mAP@70 and 4.4 mAP@90, showcasing its superiority with stricter IoU thresholds. Furthermore, the experiments conducted on the TGRS-HRRSD dataset validate the excellent generalization ability of CTAM. Full article
(This article belongs to the Topic Computer Vision and Image Processing, 2nd Edition)
Show Figures

Figure 1

26 pages, 29677 KiB  
Article
Development of a Powder Analysis Procedure Based on Imaging Techniques for Examining Aggregation and Segregation Phenomena
by Giuseppe Bonifazi, Paolo Barontini, Riccardo Gasbarrone, Davide Gattabria and Silvia Serranti
J. Imaging 2024, 10(3), 53; https://doi.org/10.3390/jimaging10030053 - 21 Feb 2024
Viewed by 1122
Abstract
In this manuscript, a method that utilizes classical image techniques to assess particle aggregation and segregation, with the primary goal of validating particle size distribution determined by conventional methods, is presented. This approach can represent a supplementary tool in quality control systems for [...] Read more.
In this manuscript, a method that utilizes classical image techniques to assess particle aggregation and segregation, with the primary goal of validating particle size distribution determined by conventional methods, is presented. This approach can represent a supplementary tool in quality control systems for powder production processes in industries such as manufacturing and pharmaceuticals. The methodology involves the acquisition of high-resolution images, followed by their fractal and textural analysis. Fractal analysis plays a crucial role by quantitatively measuring the complexity and self-similarity of particle structures. This approach allows for the numerical evaluation of aggregation and segregation phenomena, providing valuable insights into the underlying mechanisms at play. Textural analysis contributes to the characterization of patterns and spatial correlations observed in particle images. The examination of textural features offers an additional understanding of particle arrangement and organization. Consequently, it aids in validating the accuracy of particle size distribution measurements. To this end, by incorporating fractal and structural analysis, a methodology that enhances the reliability and accuracy of particle size distribution validation is obtained. It enables the identification of irregularities, anomalies, and subtle variations in particle arrangements that might not be detected by traditional measurement techniques alone. Full article
(This article belongs to the Topic Computer Vision and Image Processing, 2nd Edition)
Show Figures

Graphical abstract

16 pages, 3114 KiB  
Article
Underwater Degraded Image Restoration by Joint Evaluation and Polarization Partition Fusion
by Changye Cai, Yuanyi Fan, Ronghua Li, Haotian Cao, Shenghui Zhang and Mianze Wang
Appl. Sci. 2024, 14(5), 1769; https://doi.org/10.3390/app14051769 - 21 Feb 2024
Viewed by 591
Abstract
Images of underwater environments suffer from contrast degradation, reduced clarity, and information attenuation. The traditional method is the global estimate of polarization. However, targets in water often have complex polarization properties. For low polarization regions, since the polarization is similar to the polarization [...] Read more.
Images of underwater environments suffer from contrast degradation, reduced clarity, and information attenuation. The traditional method is the global estimate of polarization. However, targets in water often have complex polarization properties. For low polarization regions, since the polarization is similar to the polarization of background, it is difficult to distinguish between target and non-targeted regions when using traditional methods. Therefore, this paper proposes a joint evaluation and partition fusion method. First, we use histogram stretching methods for preprocessing two polarized orthogonal images, which increases the image contrast and enhances the image detail information. Then, the target is partitioned according to the values of each pixel point of the polarization image, and the low and high polarization target regions are extracted based on polarization values. To address the practical problem, the low polarization region is recovered using the polarization difference method, and the high polarization region is recovered using the joint estimation of multiple optimization metrics. Finally, the low polarization and the high polarization regions are fused. Subjectively, the experimental results as a whole have been fully restored, and the information has been retained completely. Our method can fully recover the low polarization region, effectively remove the scattering effect and increase an image’s contrast. Objectively, the results of the experimental evaluation indexes, EME, Entropy, and Contrast, show that our method performs significantly better than the other methods, which confirms the feasibility of this paper’s algorithm for application in specific underwater scenarios. Full article
(This article belongs to the Topic Computer Vision and Image Processing, 2nd Edition)
Show Figures

Figure 1

18 pages, 5785 KiB  
Article
Research on Rejoining Bone Stick Fragment Images: A Method Based on Multi-Scale Feature Fusion Siamese Network Guided by Edge Contour
by Jingjing He, Huiqin Wang, Rui Liu, Li Mao, Ke Wang, Zhan Wang and Ting Wang
Appl. Sci. 2024, 14(2), 717; https://doi.org/10.3390/app14020717 - 15 Jan 2024
Viewed by 714
Abstract
The rejoining of bone sticks holds significant importance in studying the historical and cultural aspects of the Han Dynasty. Currently, the rejoining work of bone inscriptions heavily relies on manual efforts by experts, demanding a considerable amount of time and energy. This paper [...] Read more.
The rejoining of bone sticks holds significant importance in studying the historical and cultural aspects of the Han Dynasty. Currently, the rejoining work of bone inscriptions heavily relies on manual efforts by experts, demanding a considerable amount of time and energy. This paper introduces a multi-scale feature fusion Siamese network guided by edge contour (MFS-GC) model. Constructing a Siamese network framework, it first uses a residual network to extract features of bone sticks, which is followed by computing the L2 distance for similarity measurement. During the extraction of feature vectors using the residual network, the BN layer tends to lose contour detail information, resulting in less conspicuous feature extraction, especially along fractured edges. To address this issue, the Spatially Adaptive DEnormalization (SPADE) model is employed to guide the normalization of contour images of bone sticks. This ensures that the network can learn multi-scale boundary contour features at each layer. Finally, the extracted multi-scale fused features undergo similarity measurement for local matching of bone stick fragment images. Additionally, a Conjugable Bone Stick Dataset (CBSD) is constructed. In the experimental validation phase, the MFS-GC algorithm is compared with classical similarity calculation methods in terms of precision, recall, and miss detection rate. The experiments demonstrate that the MFS-GC algorithm achieves an average accuracy of 95.5% in the Top-15 on the CBSD. The findings of this research can contribute to solving the rejoining issues of bone sticks. Full article
(This article belongs to the Topic Computer Vision and Image Processing, 2nd Edition)
Show Figures

Figure 1

16 pages, 4061 KiB  
Article
EDF-YOLOv5: An Improved Algorithm for Power Transmission Line Defect Detection Based on YOLOv5
by Hongxing Peng, Minjun Liang, Chang Yuan and Yongqiang Ma
Electronics 2024, 13(1), 148; https://doi.org/10.3390/electronics13010148 - 29 Dec 2023
Viewed by 870
Abstract
Detecting defects in power transmission lines through unmanned aerial inspection images is crucial for evaluating the operational status of outdoor transmission equipment. This paper presents a defect recognition method called EDF-YOLOv5, which is based on the YOLOv5s, to enhance detection accuracy. Firstly, the [...] Read more.
Detecting defects in power transmission lines through unmanned aerial inspection images is crucial for evaluating the operational status of outdoor transmission equipment. This paper presents a defect recognition method called EDF-YOLOv5, which is based on the YOLOv5s, to enhance detection accuracy. Firstly, the EN-SPPFCSPC module is designed to improve the algorithm’s ability to extract information, thereby enhancing the detection performance for small target defects. Secondly, the algorithm incorporates a high-level semantic feature information extraction network, DCNv3C3, which improves its ability to generalize to defects of different shapes. Lastly, a new bounding box loss function, Focal-CIoU, is introduced to enhance the contribution of high-quality samples during training. The experimental results demonstrate that the enhanced algorithm achieves a 2.3% increase in mean average precision (mAP@.5) for power transmission line defect detection, a 0.9% improvement in F1-score, and operates at a detection speed of 117 frames per second. These findings highlight the superior performance of EDF-YOLOv5 in detecting power transmission line defects. Full article
(This article belongs to the Topic Computer Vision and Image Processing, 2nd Edition)
Show Figures

Figure 1

14 pages, 7400 KiB  
Article
Non-Local Means Hole Repair Algorithm Based on Adaptive Block
by Bohu Zhao, Lebao Li and Haipeng Pan
Appl. Sci. 2024, 14(1), 159; https://doi.org/10.3390/app14010159 - 24 Dec 2023
Viewed by 585
Abstract
RGB-D cameras provide depth and color information and are widely used in 3D reconstruction and computer vision. In the majority of existing RGB-D cameras, a considerable portion of depth values is often lost due to severe occlusion or limited camera coverage, thereby adversely [...] Read more.
RGB-D cameras provide depth and color information and are widely used in 3D reconstruction and computer vision. In the majority of existing RGB-D cameras, a considerable portion of depth values is often lost due to severe occlusion or limited camera coverage, thereby adversely impacting the precise localization and three-dimensional reconstruction of objects. In this paper, to address the issue of poor-quality in-depth images captured by RGB-D cameras, a depth image hole repair algorithm based on non-local means is proposed first, leveraging the structural similarities between grayscale and depth images. Second, while considering the cumbersome parameter tuning associated with the non-local means hole repair method for determining the size of structural blocks for depth image hole repair, an intelligent block factor is introduced, which automatically determines the optimal search and repair block sizes for various hole sizes, resulting in the development of an adaptive block-based non-local means algorithm for repairing depth image holes. Furthermore, the proposed algorithm’s performance are evaluated using both the Middlebury stereo matching dataset and a self-constructed RGB-D dataset, with performance assessment being carried out by comparing the algorithm against other methods using five metrics: RMSE, SSIM, PSNR, DE, and ALME. Finally, experimental results unequivocally demonstrate the innovative resolution of the parameter tuning complexity inherent in-depth image hole repair, effectively filling the holes, suppressing noise within depth images, enhancing image quality, and achieving elevated precision and accuracy, as affirmed by the attained results. Full article
(This article belongs to the Topic Computer Vision and Image Processing, 2nd Edition)
Show Figures

Figure 1

Back to TopTop