Deep Learning-Based Computer Vision: Technologies and Applications

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: closed (31 December 2023) | Viewed by 8895

Special Issue Editors

College of Electronics and Information Engineering, Sichuan University, Chengdu 610065, China
Interests: image/video restoration; image/video coding; machine learning; image segmentation
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
School of Computer and Information Techology, Beijing Jiaotong University, Beijing 100044, China
Interests: computer vision; deep learning; image processing; RGB-D perception; 3D reconstruction
School of Electrical Engineering and Electronic Information, Xihua University, Chengdu 610039, China
Interests: image/video restoration; depth map super-resolution; depth completion; semantic segmentation

Special Issue Information

Dear Colleagues,

Whether based on machine learning or signal-processing methods, computer vision enables computers to process, analyze, and understand visual data from the world around us, improving its efficiency and accuracy in many applications. We can observe that computer vision has become increasingly important in a variety of fields, with uses in areas ranging from autonomous vehicles and surveillance systems to medical imaging and augmented reality. In recent years, deep learning has further revolutionized the field of computer vision, bringing about tremendous advances in many tasks and surpassing traditional computer vision techniques by generating state-of-the-art results. Therefore, studying the deep learning-based computer vision technologies and their applications is particularly important.

This Special Issue aims to present a comprehensive overview of the advances in deep learning for both low-level and high-level computer vision technologies and their applications. We believe that this Issue will be of interest to researchers who are working in the area of computer vision and deep learning.

We are pleased to invite you to present original high-quality research articles. In addition, review papers that overview the state-of-the-art in deep learning for computer vision and identify important research directions for the future are also welcome. Research areas may include (but are not limited to) the following: deep learning; computer vision; object detection; image classification; semantic segmentation; image generation; pose estimation; image reconstruction; image/video restoration; image/video enhancement; etc. We look forward to receiving your contributions.

Dr. Chao Ren
Dr. Yuanzhouhan Cao
Dr. Tao Li
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • deep learning
  • computer vision
  • object detection
  • image classification
  • semantic segmentation
  • image generation
  • pose estimation
  • image reconstruction
  • image/video restoration
  • image/video enhancement

Published Papers (10 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

17 pages, 1128 KiB  
Article
Few-Shot Learning for Misinformation Detection Based on Contrastive Models
by Peng Zheng, Hao Chen, Shu Hu, Bin Zhu, Jinrong Hu, Ching-Sheng Lin, Xi Wu, Siwei Lyu, Guo Huang and Xin Wang
Electronics 2024, 13(4), 799; https://doi.org/10.3390/electronics13040799 - 19 Feb 2024
Viewed by 664
Abstract
With the development of social media, the amount of fake news has risen significantly and had a great impact on both individuals and society. The restrictions imposed by censors make the objective reporting of news difficult. Most studies use supervised methods, relying on [...] Read more.
With the development of social media, the amount of fake news has risen significantly and had a great impact on both individuals and society. The restrictions imposed by censors make the objective reporting of news difficult. Most studies use supervised methods, relying on a large amount of labeled data for fake news detection, which hinders the effectiveness of the detection. Meanwhile, the focus of these studies is on the detection of fake news in a single modality, either text or images, but actual fake news is more often in the form of text–image pairs. In this paper, we introduce a self-supervised model grounded in contrastive learning. This model facilitates simultaneous feature extraction for both text and images by employing dot product graphic matching. Through contrastive learning, it augments the extraction capability of image features, leading to a robust visual feature extraction ability with reduced training data requirements. The model’s effectiveness was assessed against the baseline using the COSMOS fake news dataset. The experiments reveal that, when detecting fake news with mismatched text–image pairs, only approximately 3% of the data are used for training. The model achieves an accuracy of 80%, equivalent to 95% of the original model’s performance using full-size data for training. Notably, replacing the text encoding layer enhances experimental stability, providing a substantial advantage over the original model, specifically on the COSMOS dataset. Full article
(This article belongs to the Special Issue Deep Learning-Based Computer Vision: Technologies and Applications)
Show Figures

Figure 1

18 pages, 5623 KiB  
Article
Bidirectional Temporal Pose Matching for Tracking
by Yichuan Fang, Qingxuan Shi and Zhen Yang
Electronics 2024, 13(2), 442; https://doi.org/10.3390/electronics13020442 - 21 Jan 2024
Viewed by 583
Abstract
Multi-person pose tracking is a challenging task. It requires identifying the human poses in each frame and matching them across time. This task still faces two main challenges. Firstly, sudden camera zooming and drastic pose changes between adjacent frames may result in mismatched [...] Read more.
Multi-person pose tracking is a challenging task. It requires identifying the human poses in each frame and matching them across time. This task still faces two main challenges. Firstly, sudden camera zooming and drastic pose changes between adjacent frames may result in mismatched poses between them. Secondly, the time relationships modeled by most existing methods provide insufficient information in scenarios with long-term occlusion. In this paper, to address the first challenge, we propagate the bounding boxes of the current frame to the previous frame for pose estimation, and match the estimated results with the previous ones, which we call the Backward Temporal Pose-Matching (BTPM) module. To solve the second challenge, we design an Association Across Multiple Frames (AAMF) module that utilizes long-term temporal relationships to supplement tracking information lost in the previous frames as a Re-identification (Re-id) technique. Specifically, we select keyframes with a fixed step size in the videos and label other frames as general frames. In the keyframes, we use the BTPM module and the AAMF module to perform tracking. In the general frames, we propagate poses in the previous frame to the current frame for pose estimation and association, which we call the Forward Temporal Pose-Matching (FTPM) module. If the pose association fails, the current frame will be set as a keyframe, and tracking will be re-performed. In the PoseTrack 2018 benchmark tests, our method shows significant improvements over the baseline methods, with improvements of 2.1 and 1.1 in mean Average Precision (mAP) and Multi-Object Tracking Accuracy (MOTA), respectively. Full article
(This article belongs to the Special Issue Deep Learning-Based Computer Vision: Technologies and Applications)
Show Figures

Figure 1

16 pages, 6749 KiB  
Article
Cervical Intervertebral Disc Segmentation Based on Multi-Scale Information Fusion and Its Application
by Yi Yang, Ming Wang, Litai Ma, Xiang Zhang, Kerui Zhang, Xiaoyao Zhao, Qizhi Teng and Hao Liu
Electronics 2024, 13(2), 432; https://doi.org/10.3390/electronics13020432 - 20 Jan 2024
Viewed by 595
Abstract
The cervical intervertebral disc, a cushion-like element between the vertebrae, plays a critical role in spinal health. Investigating how to segment these discs is crucial for identifying abnormalities in cervical conditions. This paper introduces a novel approach for segmenting cervical intervertebral discs, utilizing [...] Read more.
The cervical intervertebral disc, a cushion-like element between the vertebrae, plays a critical role in spinal health. Investigating how to segment these discs is crucial for identifying abnormalities in cervical conditions. This paper introduces a novel approach for segmenting cervical intervertebral discs, utilizing a framework based on multi-scale information fusion. Central to this approach is the integration of multi-level features, both low and high, through an encoding–decoding process, combined with multi-scale semantic fusion, to progressively refine the extraction of segmentation characteristics. The multi-scale semantic fusion aspect of this framework is divided into two phases: one leveraging convolution for scale interaction and the other utilizing pooling. This dual-phase method markedly improves segmentation accuracy. Facing a shortage of datasets for cervical disc segmentation, we have developed a new dataset tailored for this purpose, which includes interpolation between layers to resolve disparities in pixel spacing along the longitudinal and transverse axes in CT image sequences. This dataset is good for advancing cervical disc segmentation studies. Our experimental findings demonstrate that our network model not only achieves good segmentation accuracy on human cervical intervertebral discs but is also highly effective for three-dimensional reconstruction and printing applications. The dataset will be publicly available soon. Full article
(This article belongs to the Special Issue Deep Learning-Based Computer Vision: Technologies and Applications)
Show Figures

Figure 1

16 pages, 3174 KiB  
Article
CNB Net: A Two-Stage Approach for Effective Image Deblurring
by Xiu Zhang, Fengbo Zheng, Lifen Jiang and Haoyu Guo
Electronics 2024, 13(2), 404; https://doi.org/10.3390/electronics13020404 - 18 Jan 2024
Cited by 1 | Viewed by 658
Abstract
Image blur, often caused by camera shake and object movement, poses a significant challenge in computer vision. Image deblurring strives to restore clarity to these images. Traditional single-stage methods, while effective in detail enhancement, often neglect global context in favor of local information. [...] Read more.
Image blur, often caused by camera shake and object movement, poses a significant challenge in computer vision. Image deblurring strives to restore clarity to these images. Traditional single-stage methods, while effective in detail enhancement, often neglect global context in favor of local information. Yet, both aspects are crucial, especially in real-life scenarios where images are typically large and subject to various blurs. Addressing this, we introduce CNB Net, an innovative deblurring network adept at integrating global and local insights for enhanced image restoration. The network operates in two stages, utilizing our specially designed Convolution and Normalization-Based Block (CNB Block) and Convolution and Normalization-Based Plus Block (CNBP Block) for multi-scale information extraction. A progressive learning approach is adopted with a Feature Active Selection (FAS) module at the end of each stage that captures spatial detail information under the guidance of real images. The Two-Stage Feature Fusion (TSFF) module reduces information loss caused by downsampling operations while enriching features across stages for increased robustness. We conduct experiments on the GoPro dataset and the HIDE dataset. On the GoPro dataset, our Peak Signal-to-Noise Ratio (PSNR) result is 32.21 and the Structural Similarity (SSIM) result is 0.950; and on the HIDE dataset, our PSNR result is 30.38 and the SSIM result is 0.932. Our results exceed other similar algorithms. By comparing the generated feature maps, we find that our model takes into account both global and local information well. Full article
(This article belongs to the Special Issue Deep Learning-Based Computer Vision: Technologies and Applications)
Show Figures

Figure 1

14 pages, 3296 KiB  
Article
A Low-Cost Detection Method for Nitrite Content in a Mariculture Water Environment Based on an Improved Residual Network
by Zhiqiang Pei, Zonghai Cai, Jingfei Meng, Yang Bai, Weiming Cai and Shengli Fan
Electronics 2024, 13(1), 85; https://doi.org/10.3390/electronics13010085 - 24 Dec 2023
Viewed by 509
Abstract
Nitrite content is one of the key indicators for measuring the quality of mariculture water and has a crucial impact on the benefits of aquaculture. Most of China’s fisheries are small-scale domestic aquaculture. For economic reasons, farmers generally use chemical colorimetry or rely [...] Read more.
Nitrite content is one of the key indicators for measuring the quality of mariculture water and has a crucial impact on the benefits of aquaculture. Most of China’s fisheries are small-scale domestic aquaculture. For economic reasons, farmers generally use chemical colorimetry or rely on life experience (such as whether the water bodies have become turbid or whether aquatic organisms have abnormal or died) to determine the nitrite content in water; however, both methods can easily lead to misjudgment and cause losses. Another more accurate method is spectrophotometry, but the spectrophotometer used is more expensive. This article aims to propose a low-cost and high-precision nitrite detection method. The new method we propose is to first perform a color development reaction using chemical detection reagents, and then use an improved residual network instead of human eyes to determine the nitrite concentration in the water sample. The advantages of this method are the fast response of the chemical reagents and the high accuracy of the machine vision recognition. Our network can achieve an accuracy of 98.3% on the test set. The experimental results indicate that this method can be applied to practical mariculture. Full article
(This article belongs to the Special Issue Deep Learning-Based Computer Vision: Technologies and Applications)
Show Figures

Figure 1

24 pages, 5085 KiB  
Article
Personalized Text-to-Image Model Enhancement Strategies: SOD Preprocessing and CNN Local Feature Integration
by Mujung Kim, Jisang Yoo and Soonchul Kwon
Electronics 2023, 12(22), 4707; https://doi.org/10.3390/electronics12224707 - 19 Nov 2023
Viewed by 1056
Abstract
Recent advancements in text-to-image models have been substantial, generating new images based on personalized datasets. However, even within a single category, such as furniture, where the structures vary and the patterns are not uniform, the ability of the generated images to preserve the [...] Read more.
Recent advancements in text-to-image models have been substantial, generating new images based on personalized datasets. However, even within a single category, such as furniture, where the structures vary and the patterns are not uniform, the ability of the generated images to preserve the detailed information of the input images remains unsatisfactory. This study introduces a novel method to enhance the quality of the results produced by text-image models. The method utilizes mask preprocessing with an image pyramid-based salient object detection model, incorporates visual information into input prompts using concept image embeddings and a CNN local feature extractor, and includes a filtering process based on similarity measures. When using this approach, we observed both visual and quantitative improvements in CLIP text alignment and DINO metrics, suggesting that the generated images more closely follow the text prompts and more accurately reflect the input image’s details. The significance of this research lies in addressing one of the prevailing challenges in the field of personalized image generation: enhancing the capability to consistently and accurately represent the detailed characteristics of input images in the output. This method enables more realistic visualizations through textual prompts enhanced with visual information, additional local features, and unnecessary area removal using a SOD mask; it can also be beneficial in fields that prioritize the accuracy of visual data. Full article
(This article belongs to the Special Issue Deep Learning-Based Computer Vision: Technologies and Applications)
Show Figures

Figure 1

17 pages, 2033 KiB  
Article
An Iterative Learning Scheme with Binary Classifier for Improved Event Detection in Surveillance Video
by Cuong H. Tran and Seong G. Kong
Electronics 2023, 12(15), 3275; https://doi.org/10.3390/electronics12153275 - 30 Jul 2023
Viewed by 643
Abstract
This paper presents an iterative training framework with a binary classifier to improve the learning capability of a deep learning model for detecting abnormal behaviors in surveillance video. When a deep learning model trained on data from one surveillance video is deployed to [...] Read more.
This paper presents an iterative training framework with a binary classifier to improve the learning capability of a deep learning model for detecting abnormal behaviors in surveillance video. When a deep learning model trained on data from one surveillance video is deployed to monitor another video stream, its abnormal behavior detection performance often decreases significantly. To ensure the desired performance in new environments, the deep learning model needs to be retrained with additional training data from the new video stream. Iterative training requires manual annotation of the additional training data during the fine-tuning process, which is a tedious and error-prone task. To address this issue, this paper proposes a binary classifier to automatically label false positive data without human intervention. The binary classifier is trained on bounding boxes extracted from the detection model to identify which boxes are true positives or false positives. The proposed learning framework incrementally enhances the performance of the deep learning model for detecting abnormal behaviors in a surveillance video stream through repeated iterative learning cycles. Experimental results demonstrate that the accuracy of the detection model increases from 0.35 (mAP = 0.74) to 0.91 (mAP = 0.99) in just a few iterations. Full article
(This article belongs to the Special Issue Deep Learning-Based Computer Vision: Technologies and Applications)
Show Figures

Figure 1

24 pages, 11992 KiB  
Article
YOLO-MBBi: PCB Surface Defect Detection Method Based on Enhanced YOLOv5
by Bowei Du, Fang Wan, Guangbo Lei, Li Xu, Chengzhi Xu and Ying Xiong
Electronics 2023, 12(13), 2821; https://doi.org/10.3390/electronics12132821 - 26 Jun 2023
Cited by 7 | Viewed by 2171
Abstract
Printed circuit boards (PCBs) are extensively used to assemble electronic equipment. Currently, PCBs are an integral part of almost all electronic products. However, various surface defects can still occur during mass production. An enhanced YOLOv5s network named YOLO-MBBi is proposed to detect surface [...] Read more.
Printed circuit boards (PCBs) are extensively used to assemble electronic equipment. Currently, PCBs are an integral part of almost all electronic products. However, various surface defects can still occur during mass production. An enhanced YOLOv5s network named YOLO-MBBi is proposed to detect surface defects on PCBs to address the shortcomings of the existing PCB surface defect detection methods, such as their low accuracy and poor real-time performance. YOLO-MBBi uses MBConv (mobile inverted residual bottleneck block) modules, CBAM attention, BiFPN, and depth-wise convolutions to substitute layers in the YOLOv5s network and replace the CIoU loss function with the SIoU loss function during training. Two publicly available datasets were selected for this experiment. The experimental results showed that the mAP50 and recall values of YOLO-MBBi were 95.3% and 94.6%, which were 3.6% and 2.6% higher than those of YOLOv5s, respectively, and the FLOPs were 12.8, which was much smaller than YOLOv7’s 103.2. The FPS value reached 48.9. Additionally, after using another dataset, the YOLO-MBBi metrics also achieved satisfactory accuracy and met the needs of industrial production. Full article
(This article belongs to the Special Issue Deep Learning-Based Computer Vision: Technologies and Applications)
Show Figures

Figure 1

14 pages, 4753 KiB  
Article
Blood Group Interpretation Algorithm Based on Improved AlexNet
by Ranxin Shen, Jiayi Wen and Peiyi Zhu
Electronics 2023, 12(12), 2608; https://doi.org/10.3390/electronics12122608 - 09 Jun 2023
Viewed by 1004
Abstract
Traditional blood group interpretation technology has poor detection efficiency and interpretation accuracy in the face of complex conditions in clinical environments. In order to improve the interpretation accuracy of the automatic blood group interpretation system, the important role of deep learning in the [...] Read more.
Traditional blood group interpretation technology has poor detection efficiency and interpretation accuracy in the face of complex conditions in clinical environments. In order to improve the interpretation accuracy of the automatic blood group interpretation system, the important role of deep learning in the blood group interpretation system was studied. Based on the AlexNet network model, this paper proposes an improved scheme because of its advantages in terms of speeding up the convergence training speed and enhancing the model’s generalizability. However, it still needs improvement in terms of blood group interpretation accuracy. The improved AlexNet network model proposed in this paper added an attention mechanism to the network structure, optimized the loss function in the training algorithm, and adjusted the learning rate attenuation function. The experiments showed that compared with the accuracy of the AlexNet model, its training effect was remarkable, with an accuracy of 96.9%—an increase of 3%. Moreover, the improved network model paid more attention to fine-grained classification, minimized the loss rate, and improved the accuracy of system interpretation. Full article
(This article belongs to the Special Issue Deep Learning-Based Computer Vision: Technologies and Applications)
Show Figures

Figure 1

Review

Jump to: Research

25 pages, 13403 KiB  
Review
A Review of Document Binarization: Main Techniques, New Challenges, and Trends
by Zhengxian Yang, Shikai Zuo, Yanxi Zhou, Jinlong He and Jianwen Shi
Electronics 2024, 13(7), 1394; https://doi.org/10.3390/electronics13071394 - 07 Apr 2024
Viewed by 302
Abstract
Document image binarization is a challenging task, especially when it comes to text segmentation in degraded document images. The binarization, as a pre-processing step of Optical Character Recognition (OCR), is one of the most fundamental and commonly used segmentation methods. It separates the [...] Read more.
Document image binarization is a challenging task, especially when it comes to text segmentation in degraded document images. The binarization, as a pre-processing step of Optical Character Recognition (OCR), is one of the most fundamental and commonly used segmentation methods. It separates the foreground text from the background of the document image to facilitate subsequent image processing. In view of the different degradation degrees of document images, researchers have proposed a variety of solutions. In this paper, we have summarized some challenges and difficulties in the field of document image binarization. Approximately 60 methods documenting image binarization techniques are mentioned, including traditional algorithms and deep learning-based algorithms. Here, we evaluated the performance of 25 image binarization techniques on the H-DIBCO2016 dataset to provide some help for future research. Full article
(This article belongs to the Special Issue Deep Learning-Based Computer Vision: Technologies and Applications)
Show Figures

Figure 1

Back to TopTop