Deep Learning in Computer Vision and Image Processing

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: closed (15 February 2024) | Viewed by 13985

Special Issue Editors


E-Mail Website
Guest Editor
School of Computer Science and Engineering, Central South University, Changsha 410083, China
Interests: computer vision; medical imaging; medical big data management

E-Mail Website
Guest Editor
School of Computer Science and Engineering, Central South University, Changsha 410083, China
Interests: medical artificial intelligence; medical imaging

E-Mail Website
Guest Editor
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
Interests: medical extended reality; artificial intelligence

Special Issue Information

Dear Colleagues,

In recent years, with the rapid development of deep learning algorithms, graphics processing units (GPUs) and other hardware computing devices, deep learning technology has been widely used in computer vision and image processing, such as object detection, image segmentation, face recognition, autonomous driving, virtual reality, medical diagnosis and other industries. As an application-oriented technology, it has shown immeasurable commercial value and application prospects. However, the high computational cost limits its application in certain systems, such as mobile phones, an indispensable terminal device in modern society. Therefore, how to effectively improve the existing methods based on deep learning techniques is also an area of valuable research. Existing acceleration studies include quantization, pruning, sparse and distillation, but they are more limited to classification networks, and combining acceleration efforts for different tasks remains an important direction worth investigating. In this case, research on the use of deep learning in computer vision and image processing would be very timely and of great interest to all those working in this field, and this topic may enable more practical applications.

In this Special Issue, we welcome manuscripts on the design of specialized acceleration algorithms for concrete research directions to enable task-specific applications. In addition, we also welcome all emerging research efforts, new solutions to existing problems, and even new problems. Research areas may include (but are not limited to) the following:

  • Three-dimensional computer vision;
  • Applications of computer vision and vision for X;
  • Biomedical image analysis;
  • Computational photography, sensing, and display;
  • Deep learning for computer vision;
  • Document image analysis;
  • Faces and gestures;
  • Poses and actions;
  • Generative models for computer vision;
  • Low-level vision and image processing;
  • Motion and tracking;
  • Physics-based vision and shape from X;
  • Recognition, including feature detection, indexing, matching, and shape representation;
  • RGBD and depth image processing;
  • Robot vision;
  • Scene analysis and understanding;
  • Computer vision theory;
  • Segmentation and grouping;
  • Video analysis and event recognition;
  • Video processing and communication;
  • Efficient training and inference methods for n.

We look forward to receiving your contributions.

Prof. Dr. Beiji Zou
Prof. Dr. Xiaoyan Kui
Dr. Weixin Si
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • efficient deep learning
  • computer vision
  • image processing
  • model compression and acceleration

Published Papers (12 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

26 pages, 10184 KiB  
Article
Deep-Learning-Based Action and Trajectory Analysis for Museum Security Videos
by Christian Di Maio, Giacomo Nunziati and Alessandro Mecocci
Electronics 2024, 13(7), 1194; https://doi.org/10.3390/electronics13071194 - 25 Mar 2024
Viewed by 426
Abstract
Recent advancements in deep learning and video analysis, combined with the efficiency of contemporary computational resources, have catalyzed the development of advanced real-time computational systems, significantly impacting various fields. This paper introduces a cutting-edge video analysis framework that was specifically designed to bolster [...] Read more.
Recent advancements in deep learning and video analysis, combined with the efficiency of contemporary computational resources, have catalyzed the development of advanced real-time computational systems, significantly impacting various fields. This paper introduces a cutting-edge video analysis framework that was specifically designed to bolster security in museum environments. We elaborate on the proposed framework, which was evaluated and integrated into a real-time video analysis pipeline. Our research primarily focused on two innovative approaches: action recognition for identifying potential threats at the individual level and trajectory extraction for monitoring museum visitor movements, serving the dual purposes of security and visitor flow analysis. These approaches leverage a synergistic blend of deep learning models, particularly CNNs, and traditional computer vision techniques. Our experimental findings affirmed the high efficacy of our action recognition model in accurately distinguishing between normal and suspicious behaviors within video feeds. Moreover, our trajectory extraction method demonstrated commendable precision in tracking and analyzing visitor movements. The integration of deep learning techniques not only enhances the capability for automatic detection of malevolent actions but also establishes the trajectory extraction process as a robust and adaptable tool for various analytical endeavors beyond mere security applications. Full article
(This article belongs to the Special Issue Deep Learning in Computer Vision and Image Processing)
Show Figures

Figure 1

15 pages, 10164 KiB  
Article
Detection of Underground Dangerous Area Based on Improving YOLOV8
by Yunfeng Ni, Jie Huo, Ying Hou, Jing Wang and Ping Guo
Electronics 2024, 13(3), 623; https://doi.org/10.3390/electronics13030623 - 02 Feb 2024
Viewed by 977
Abstract
In order to improve the safety needs of personnel in the dark environment under the well, this article adopts the improved YOLOV8 algorithm combined with the ray method to determine whether underground personnel are entering dangerous areas and to provide early warning. First [...] Read more.
In order to improve the safety needs of personnel in the dark environment under the well, this article adopts the improved YOLOV8 algorithm combined with the ray method to determine whether underground personnel are entering dangerous areas and to provide early warning. First of all, this article introduces the coordinate attention mechanism on the basis of YOLOV8 target detection so that the model pays attention to the location information of the target area so as to improve the detection accuracy of obstruction and small target areas. In addition, the Soft-Non-Maximum Suppression (SNMS) module is introduced to further improve accuracy. The improved model is then combined with the ray method to be deployed and applied under a variety of angles and different scenic information cameras. The experimental results show that the proposed method obtains 99.5% of the identification accuracy and a frame speed of 45 Frames Per Second (FPS) on the self-built dataset. Compared with the YOLOV8 model, it has a higher accuracy and can effectively cope with the changes and interference factors in the underground environment. Further, it meets the requirements for real-time testing in dangerous underground areas. Full article
(This article belongs to the Special Issue Deep Learning in Computer Vision and Image Processing)
Show Figures

Figure 1

27 pages, 13669 KiB  
Article
Research on the Construction of an Efficient and Lightweight Online Detection Method for Tiny Surface Defects through Model Compression and Knowledge Distillation
by Qipeng Chen, Qiaoqiao Xiong, Haisong Huang, Saihong Tang and Zhenghong Liu
Electronics 2024, 13(2), 253; https://doi.org/10.3390/electronics13020253 - 05 Jan 2024
Viewed by 760
Abstract
In response to the current issues of poor real-time performance, high computational costs, and excessive memory usage of object detection algorithms based on deep convolutional neural networks in embedded devices, a method for improving deep convolutional neural networks based on model compression and [...] Read more.
In response to the current issues of poor real-time performance, high computational costs, and excessive memory usage of object detection algorithms based on deep convolutional neural networks in embedded devices, a method for improving deep convolutional neural networks based on model compression and knowledge distillation is proposed. Firstly, data augmentation is employed in the preprocessing stage to increase the diversity of training samples, thereby improving the model’s robustness and generalization capability. The K-means++ clustering algorithm generates candidate bounding boxes, adapting to defects of different sizes and selecting finer features earlier. Secondly, the cross stage partial (CSP) Darknet53 network and spatial pyramid pooling (SPP) module extract features from the input raw images, enhancing the accuracy of defect location detection and recognition in YOLO. Finally, the concept of model compression is integrated, utilizing scaling factors in the batch normalization (BN) layer, and introducing sparse factors to perform sparse training on the network. Channel pruning and layer pruning are applied to the sparse model, and post-processing methods using knowledge distillation are used to effectively reduce the model size and forward inference time while maintaining model accuracy. The improved model size decreases from 244 M to 4.19 M, the detection speed increases from 32.8 f/s to 68 f/s, and mAP reaches 97.41. Experimental results demonstrate that this method is conducive to deploying network models on embedded devices with limited GPU computing and storage resources. It can be applied in distributed service architectures for edge computing, providing new technological references for deploying deep learning models in the industrial sector. Full article
(This article belongs to the Special Issue Deep Learning in Computer Vision and Image Processing)
Show Figures

Figure 1

14 pages, 12541 KiB  
Article
BFE-Net: Object Detection with Bidirectional Feature Enhancement
by Rong Zhang, Zhongjie Zhu, Long Li, Yongqiang Bai and Jiong Shi
Electronics 2023, 12(21), 4531; https://doi.org/10.3390/electronics12214531 - 03 Nov 2023
Cited by 1 | Viewed by 668
Abstract
In realistic scenarios, existing object detection models still face challenges in resisting interference and detecting small objects due to complex environmental factors such as light and noise. For this reason, a novel scheme termed BFE-Net based on bidirectional feature enhancement is proposed. Firstly, [...] Read more.
In realistic scenarios, existing object detection models still face challenges in resisting interference and detecting small objects due to complex environmental factors such as light and noise. For this reason, a novel scheme termed BFE-Net based on bidirectional feature enhancement is proposed. Firstly, a new multi-scale feature extraction module is constructed, which uses a self-attention mechanism to simulate human visual perception. It is used to capture global information and long-range dependencies between pixels, thereby optimizing the extraction of multi-scale features from input images. Secondly, a feature enhancement and denoising module is designed, based on bidirectional information flow. In the top-down, the impact of noise on the feature map is weakened to further enhance the feature extraction. In the bottom-up, multi-scale features are fused to improve the accuracy of small object feature extraction. Lastly, a generalized intersection over union regression loss function is employed to optimize the movement direction of predicted bounding boxes, improving the efficiency and accuracy of object localization. Experimental results using the public dataset PASCAL VOC2007test show that our scheme achieves a mean average precision (mAP) of 85% for object detection, which is 2.3% to 8.6% higher than classical methods such as RetinaNet and YOLOv5. Particularly, the anti-interference capability and the performance in detecting small objects show a significant enhancement. Full article
(This article belongs to the Special Issue Deep Learning in Computer Vision and Image Processing)
Show Figures

Figure 1

17 pages, 15790 KiB  
Article
Siamese Visual Tracking with Spatial-Channel Attention and Ranking Head Network
by Jianming Zhang, Yifei Liang, Xiaoyi Huang, Li-Dan Kuang and Bin Zheng
Electronics 2023, 12(20), 4351; https://doi.org/10.3390/electronics12204351 - 20 Oct 2023
Viewed by 837
Abstract
Trackers based on the Siamese network have received much attention in recent years, owing to its remarkable performance, and the task of object tracking is to predict the location of the target in current frame. However, during the tracking process, distractors with similar [...] Read more.
Trackers based on the Siamese network have received much attention in recent years, owing to its remarkable performance, and the task of object tracking is to predict the location of the target in current frame. However, during the tracking process, distractors with similar appearances affect the judgment of the tracker and lead to tracking failure. In order to solve this problem, we propose a Siamese visual tracker with spatial-channel attention and a ranking head network. Firstly, we propose a Spatial Channel Attention Module, which fuses the features of the template and the search region by capturing both the spatial and the channel information simultaneously, allowing the tracker to recognize the target to be tracked from the background. Secondly, we design a ranking head network. By introducing joint ranking loss terms including classification ranking loss and confidence&IoU ranking loss, classification and regression branches are linked to refine the tracking results. Through the mutual guidance between the classification confidence score and IoU, a better positioning regression box is selected to improve the performance of the tracker. To better demonstrate that our proposed method is effective, we test the proposed tracker on the OTB100, VOT2016, VOT2018, UAV123, and GOT-10k testing datasets. On OTB100, the precision and success rate of our tracker are 0.925 and 0.700, respectively. Considering accuracy and speed, our method, overall, achieves state-of-the-art performance. Full article
(This article belongs to the Special Issue Deep Learning in Computer Vision and Image Processing)
Show Figures

Figure 1

22 pages, 3590 KiB  
Article
Infrared and Visible Image Fusion Based on Mask and Cross-Dynamic Fusion
by Qiang Fu, Hanxiang Fu and Yuezhou Wu
Electronics 2023, 12(20), 4342; https://doi.org/10.3390/electronics12204342 - 19 Oct 2023
Viewed by 834
Abstract
Both single infrared and visible images have respective limitations. Fusion technology has been developed to conquer these restrictions. It is designed to generate a fused image with infrared information and texture details. Most traditional fusion methods use hand-designed fusion strategies, but some are [...] Read more.
Both single infrared and visible images have respective limitations. Fusion technology has been developed to conquer these restrictions. It is designed to generate a fused image with infrared information and texture details. Most traditional fusion methods use hand-designed fusion strategies, but some are too rough and have limited fusion performance. Recently, some researchers have proposed fusion methods based on deep learning, but some early fusion networks cannot adaptively fuse images due to unreasonable design. Therefore, we propose a mask and cross-dynamic fusion-based network called MCDFN. This network adaptively preserves the salient features of infrared images and the texture details of visible images through an end-to-end fusion process. Specifically, we designed a two-stage fusion network. In the first stage, we train the autoencoder network so that the encoder and decoder learn feature extraction and reconstruction capabilities. In the second stage, the autoencoder is fixed, and we employ a fusion strategy combining mask and cross-dynamic fusion to train the entire fusion network. This strategy is conducive to the adaptive fusion of image information between infrared images and visible images in multiple dimensions. On the public TNO dataset and the RoadScene dataset, we selected nine different fusion methods to compare with our proposed method. Experimental results show that our proposed fusion method achieves good results on both datasets. Full article
(This article belongs to the Special Issue Deep Learning in Computer Vision and Image Processing)
Show Figures

Figure 1

20 pages, 3764 KiB  
Article
ESTUGAN: Enhanced Swin Transformer with U-Net Discriminator for Remote Sensing Image Super-Resolution
by Chunhe Yu, Lingyue Hong, Tianpeng Pan, Yufeng Li and Tingting Li
Electronics 2023, 12(20), 4235; https://doi.org/10.3390/electronics12204235 - 13 Oct 2023
Cited by 1 | Viewed by 809
Abstract
Remote sensing image super-resolution (SR) is a practical research topic with broad applications. However, the mainstream algorithms for this task suffer from limitations. CNN-based algorithms face difficulties in modeling long-term dependencies, while generative adversarial networks (GANs) are prone to producing artifacts, making it [...] Read more.
Remote sensing image super-resolution (SR) is a practical research topic with broad applications. However, the mainstream algorithms for this task suffer from limitations. CNN-based algorithms face difficulties in modeling long-term dependencies, while generative adversarial networks (GANs) are prone to producing artifacts, making it difficult to reconstruct high-quality, detailed images. To address these challenges, we propose ESTUGAN for remote sensing image SR. On the one hand, ESTUGAN adopts the Swin Transformer as the network backbone and upgrades it to fully mobilize input information for global interaction, achieving impressive performance with fewer parameters. On the other hand, we employ a U-Net discriminator with the region-aware learning strategy for assisted supervision. The U-shaped design enables us to obtain structural information at each hierarchy and provides dense pixel-by-pixel feedback on the predicted images. Combined with the region-aware learning strategy, our U-Net discriminator can perform adversarial learning only for texture-rich regions, effectively suppressing artifacts. To achieve flexible supervision for the estimation, we employ the Best-buddy loss. And we also add the Back-projection loss as a constraint for the faithful reconstruction of the high-resolution image distribution. Extensive experiments demonstrate the superior perceptual quality and reliability of our proposed ESTUGAN in reconstructing remote sensing images. Full article
(This article belongs to the Special Issue Deep Learning in Computer Vision and Image Processing)
Show Figures

Figure 1

15 pages, 3158 KiB  
Article
A Multi-Channel Parallel Keypoint Fusion Framework for Human Pose Estimation
by Xilong Wang, Nianfeng Shi, Guoqiang Wang, Jie Shao and Shuaibo Zhao
Electronics 2023, 12(19), 4019; https://doi.org/10.3390/electronics12194019 - 24 Sep 2023
Viewed by 930
Abstract
Although modeling self-attention can significantly reduce computational complexity, human pose estimation performance is still affected by occlusion and background noise, and undifferentiated feature fusion leads to significant information loss. To address these issues, we propose a novel human pose estimation framework called DatPose [...] Read more.
Although modeling self-attention can significantly reduce computational complexity, human pose estimation performance is still affected by occlusion and background noise, and undifferentiated feature fusion leads to significant information loss. To address these issues, we propose a novel human pose estimation framework called DatPose (deformable convolution and attention for human pose estimation), which combines deformable convolution and self-attention to relieve these issues. Considering that the keypoints of the human body are mostly distributed at the edge of the human body, we adopt the deformable convolution strategy to obtain the low-level feature information of the image. Our proposed method leverages visual cues to capture detailed keypoint information, which we embed into the Transformer encoder to learn the keypoint constraints. More importantly, we designed a multi-channel two-way parallel module with self-attention and convolution fusion to enhance the weight of the keypoints in visual cues. In order to strengthen the implicit relationship of fusion, we attempt to generate keypoint tokens to the visual cues of the fusion module and transformers, respectively. Our experimental results on the COCO and MPII datasets show that performing the keypoint fusion module improves keypoint information. Extensive experiments and visual analysis demonstrate the robustness of our model in complex scenes and our framework outperforms popular lightweight networks in human pose estimation. Full article
(This article belongs to the Special Issue Deep Learning in Computer Vision and Image Processing)
Show Figures

Figure 1

19 pages, 5664 KiB  
Article
YOLOv5-OCDS: An Improved Garbage Detection Model Based on YOLOv5
by Qiuhong Sun, Xiaotian Zhang, Yujia Li and Jingyang Wang
Electronics 2023, 12(16), 3403; https://doi.org/10.3390/electronics12163403 - 10 Aug 2023
Cited by 1 | Viewed by 2105
Abstract
As the global population grows and urbanization accelerates, the garbage that is generated continues to increase. This waste causes serious pollution to the ecological environment, affecting the stability of the global environmental balance. Garbage detection technology can quickly and accurately identify, classify, and [...] Read more.
As the global population grows and urbanization accelerates, the garbage that is generated continues to increase. This waste causes serious pollution to the ecological environment, affecting the stability of the global environmental balance. Garbage detection technology can quickly and accurately identify, classify, and locate many kinds of garbage to realize the automatic disposal and efficient recycling of waste, and it can also promote the development of a circular economy. However, the existing garbage detection technology has some problems, such as low precision and a poor detection effect in complex environments. Although YOLOv5 has achieved good results in garbage detection, the detection results cannot meet the requirements in complex scenarios, so this paper proposes a garbage detection model, YOLOv5-OCDS, based on an improved YOLOv5. Replacing the partial convolution in the neck with Omni-Dimensional Dynamic Convolution (ODConv) improves the expressiveness of the model. The C3DCN structure is constructed, and parts of the C3 structures in the neck are replaced by C3DCN structures, allowing the model to better adapt to object deformation and target scale change. The decoupled head is used for classification and regression tasks so that the model can learn each class’s characteristics and positioning information more intently, and flexibility and extensibility can be improved. The Soft Non-Maximum Suppression (Soft NMS) algorithm can better retain the target’s information and effectively avoid the problem of repeated detection. The self-built garbage classification dataset is used for related experiments, and the mAP@50 of the YOLOv5-OCDS model is 5.3% higher than that of the YOLOv5s; the value of mAP@50:95 increases by 12.3%. In the experimental environment of this study, the model’s Frames Per Second (FPS) was 61.7 f/s. In practical applications, when we use some old GPU, such as the GTX1060, it can still reach 50.3 f/s, so that real-time detection can be achieved. Thus, the improved model suits garbage detection tasks in complex environments. Full article
(This article belongs to the Special Issue Deep Learning in Computer Vision and Image Processing)
Show Figures

Figure 1

18 pages, 8071 KiB  
Article
A Face Detector with Adaptive Feature Fusion in Classroom Environment
by Cheng Sun, Pei Wen, Shiwen Zhang, Xingjin Wu, Jin Zhang and Hongfang Gong
Electronics 2023, 12(7), 1738; https://doi.org/10.3390/electronics12071738 - 06 Apr 2023
Cited by 1 | Viewed by 1496
Abstract
Face detection in the classroom environment is the basis for student face recognition, sensorless attendance, and concentration analysis. Due to equipment, lighting, and the uncontrollability of students in an unconstrained environment, images include many moving faces, occluded faces, and extremely small faces in [...] Read more.
Face detection in the classroom environment is the basis for student face recognition, sensorless attendance, and concentration analysis. Due to equipment, lighting, and the uncontrollability of students in an unconstrained environment, images include many moving faces, occluded faces, and extremely small faces in a classroom environment. Since the image sent to the detector will be resized to a smaller size, the face information extracted by the detector is very limited. This seriously affects the accuracy of face detection. Therefore, this paper proposes an adaptive fusion-based YOLOv5 method for face detection in classroom environments. First, a very small face detection layer in YOLOv5 is added to enhance the YOLOv5 baseline, and an adaptive fusion backbone network based on multi-scale features is proposed, which has the ability to feature fusion and rich feature information. Second, the adaptive spatial feature fusion strategy is applied to the network, considering the face location information and semantic information. Finally, a face dataset Classroom-Face in the classroom environment is creatively proposed, and it is verified with our method. The experimental results show that, compared with YOLOv5 or other traditional algorithms, our algorithm portrays better performance in WIDER-FACE Dataset and Classroom-Face dataset. Full article
(This article belongs to the Special Issue Deep Learning in Computer Vision and Image Processing)
Show Figures

Figure 1

13 pages, 2112 KiB  
Article
MFSR: Light Field Images Spatial Super Resolution Model Integrated with Multiple Features
by Jianfei Zhou and Hongbing Wang
Electronics 2023, 12(6), 1480; https://doi.org/10.3390/electronics12061480 - 21 Mar 2023
Viewed by 1464
Abstract
Light Field (LF) cameras can capture angular and spatial information simultaneously, making them suitable for a wide range of applications such as refocusing, disparity estimation, and virtual reality. However, the limited spatial resolution of the LF images hinders their applicability. In order to [...] Read more.
Light Field (LF) cameras can capture angular and spatial information simultaneously, making them suitable for a wide range of applications such as refocusing, disparity estimation, and virtual reality. However, the limited spatial resolution of the LF images hinders their applicability. In order to address this issue, we propose an end-to-end learning-based light field super-resolution (LFSR) model called MFSR, which integrates multiple features, including spatial, angular, epipolar plane images (EPI), and global features. These features are extracted separately from the LF image and then fused together to obtain a comprehensive feature using the Feature Extract Block (FE Block) iteratively. Gradient loss is added into the loss function to ensure that the MFSR has good performance for LF images with rich texture. Experimental results on synthetic and real-world datasets demonstrate that the proposed method outperforms other state-of-the-art methods, with a peak signal-to-noise ratio (PSNR) improvement of 0.208 dB and 0.274 dB on average for the 2× and 4× super-resolution tasks, and structural similarity (SSIM) of both improvements of 0.01 on average. Full article
(This article belongs to the Special Issue Deep Learning in Computer Vision and Image Processing)
Show Figures

Figure 1

17 pages, 3866 KiB  
Article
An Attention-Based Uncertainty Revising Network with Multi-Loss for Environmental Microorganism Segmentation
by Hengyuan Na, Dong Liu and Shengsheng Wang
Electronics 2023, 12(3), 763; https://doi.org/10.3390/electronics12030763 - 02 Feb 2023
Viewed by 1410
Abstract
The presence of environmental microorganisms is inevitable in our surroundings, and segmentation is essential for researchers to identify, understand, and utilize the microorganisms; make use of their benefits; and prevent harm. However, the segmentation of environmental microorganisms is challenging because their vague margins [...] Read more.
The presence of environmental microorganisms is inevitable in our surroundings, and segmentation is essential for researchers to identify, understand, and utilize the microorganisms; make use of their benefits; and prevent harm. However, the segmentation of environmental microorganisms is challenging because their vague margins are almost transparent compared with those of the environment. In this study, we propose a network with an uncertainty feedback module to find ambiguous boundaries and regions and an attention module to localize the major region of the microorganism. Furthermore, we apply a mid-pred module to output low-resolution segmentation results directly from decoder blocks at each level. This module can help the encoder and decoder capture details from different scales. Finally, we use multi-loss to guide the training. Rigorous experimental evaluations on the benchmark dataset demonstrate that our method achieves higher scores than other sophisticated network models (95.63% accuracy, 89.90% Dice, 81.65% Jaccard, 94.68% recall, 0.59 ASD, 2.24 HD95, and 85.58% precision) and outperforms them. Full article
(This article belongs to the Special Issue Deep Learning in Computer Vision and Image Processing)
Show Figures

Figure 1

Back to TopTop